computing

Analysing Simulation Data CECAM Workshop, Jülich, 14-15 October 2015

IMG_0987 (1)

This two day workshop on Analysing Simulation Data was part of the larger CECAM Macromolecular Simulation Software Workshop at the Forschnungzentrum, Jülich that I co-organised. It was the second workshop and immediately followed an introductory Software Carpentry workshop.

Prior to a few years ago I analysed all my simulation data using either VMD, often by writing Tcl scripts, or using a GROMACS g_tools, if one was available. Then I started using MDAnalysis, a python module. This enabled me to do two things: first MDAnalysis has its own analysis routines and therefore you could often do all the analysis you needed to in a simple python script. More powerfully, since it can read many different simulation formats, it can also act as a gatekeeper to the huge range of powerful python modules. The net result is I have been able to analyse my data in ways that previously would not have been possible (i.e. I would have had to write C code.)

For example we presented a paper (open access) at a Faraday Discussion meeting where we used the image-processing tools in scikt-image to analyse whether the presence of a small cell-signalling protein retarded the rate at which a three-component lipid bilayer phase separated. I posted some example code on GitHub. In other work, not all published, I have used scipy and numpy to, for example, calculate the power spectra of fluctuations in lipid bilayers (using fast fourier transforms).

Aim

The workshop brought together researchers, especially PhD students and postdoctoral researchers, and academic software developers.

The hope was that the researchers would come out of it feeling not only more confident about developing their own software and maybe even start contributing to an academic open source project but also that they could use the python ecosystem to analyse their data in new and interesting ways.

For the developers the hope was they would get to talk to a range of current and prospective users and gain a better understanding of how people are using their code (and maybe pick up some contributors along the way)

IMG_0970

Structure

I felt that a traditional didactic approach wouldn’t work; so no sessions of talks + questions. In the end I stole shamelessly from the excellent series of Collaborations Workshops run by the Software Sustainability Institute in the UK. The workshop worked towards and cumulated in a HackDay. I now believe HackDays are great ways of not only teaching but also as a way of building teams — I am writing a post of this for the SSI and will link to it here when it is up.

IMG_0988 (1)I invited developers from two biomolecular python projects: MDAnalysis and pmx. Given more budget I would have loved to invite other developers, e.g. from mdtraj. On the first day each project gave a short talk followed by around two hours of guided tutorial. Then at the end of day one, I invited participants to present analysis problems drawn from their own research that they would like to solve. Teams were allowed to form around six ideas. On day two these teams had around six hours to solve their problem, before presenting their solution to the rest of the workshop. The winning project, MDARTINI, aimed to make MDAnalysis more aware of the coarse-grained forcefield, MARTINI.

Feedback

Overall,

  • 94% of participants enjoyed the workshop,
  • 100% learnt something useful that will help their research
  • 100% would recommend a workshop like this one to other researchers.
  • 88% feel confident enough to contribute to an academic open source project.
graph-understand-enough-try

“I now understand enough to try using the following tools”

I then asked “I now understand enough to try using the following tools”. Given most participants had heard of MDAnalysis, but only a few had used it – and very few had heard of pmx – this is an encouraging shift. This was then followed up by: “I intend using the tools and methods to help my research”. Usually, the answers are a bit more pessimistic as people might understand a tool, but not have any intention of using it. Here, though it goes the other way.

graph-intend-using

“I intend using the tools and methods to help my research.”

To try and understand which parts of the workshop went well: “I enjoyed the following components of the workshop”. So, talks were ok, then the HackDay but the tutorials and meeting other researchers were most highly rated.

graph-enjoyed-components

“I enjoyed the following components of the workshop.”

Finally, to find out if there were any practical problems I asked “The following elements contributed to making the workshop a success”.

graph-elements-contributed-success

“The following elements contributed to making the workshop a success.”

The big problem here was the network; we had better connectivity in the small hotel in Jülich. It turned out there was a problem with the wireless router in the room and this was fixed a few days after this workshop. Nor did many people like the location in Jülich, however the various coffee breaks – which we were grateful to the Software Sustainability Institute for sponsoring – and the general social atmosphere were appreciated.

 

 

Lessons for next time

This type of workshop is very complicated and plenty can go wrong. Always have a Plan B. For example, assume that not everyone will be able to install all the necessary software on their laptops so come prepared with a (linux) virtual machine image that will work in all the tutorials. And don’t assume that the network will “just work”.

Software Carpentry Workshop, Jülich, 12-13 October 2015

IMG_0944 (1)Last week, myself and David Dotson from
ASU, ran a 2 day Software Carpentry workshop to kick off the CECAM Macromolecular Simulation Software Workshop at the Forschnungzentrum, Jülich. The idea was to give participants who were less well versed in python and working collaboratively with e.g. git a crash course to bring them up to speed for the following five mini-workshops. As you can imagine, coffee and tea are essential for running an intensive bootcamp and we owe thanks to The Software Sustainability Institute for sponsoring our coffee breaks.

 
As we were a self-organised workshop, there was no centrally coordinated surveying of the participants to gauge their level of experience. So instead I sent out a questionnaire very similar to one I’d previously sent before the first workshop I organised back in 2012. As is often the case, the learners were more comfortable with bash and simple python, but hadn’t heard or used testing or version control. Interestingly, compared to this previous workshop a higher proportion of learners were experienced in bash and python. Both groups were drawn from the bimolecular simulation community so this may reflect an increasing level of expertise.

fig-pre-expertiseThe workshop itself was the smoothest I’ve been involved in; I think it helped that both myself and David have taught several now. Also, devoting three hours for each of bash and version control and then six hours for python (including coffee breaks) meant it wasn’t quite as rushed. The last workshop I taught was in January 2015 and the course materials have been overhauled and updated and separated from the workshop GitHub repository. The latest version of the materials seemed to work well.
It also meant I was unfamiliar with the evolution of ipython notebooks into jupyter notebooks which David used to teach. Interestingly, although there was only one helper, Charlie Laughton, we were never overwhelmed. At each workshop I have taught or organised the ratio of helpers to learners decreased, which may reflect improvements in installation and the course materials.
Finally, I was live coding on my Mac laptop and using the new Split View in Mac OS 10.11 worked really well.

That is what I thought: what about the learners? I had fifteen
responfig-post-understand-enoughses to the questionnaire which was about a two-thirds response rate. All of them agreed with the statements “I enjoyed the Software Carpentry workshop” and “I feel I learnt something useful that will help my research”, but as we know, enjoyment does not necessarily translate into learning! As before I asked two key questions. First “I now understand enough to try using the following tools/approaches.”. As you can see there is a big shift in attitude compared to before the workshop with the majority of people feeling that they understood the tools covered during the workshop.

fig-post-intend-usingBut will this translate into a change in behaviour? To try and test this I also asked
“I intend using the tools listed below to help my research”. The results are pretty similar but interestingly peoples intentions were stronger than their understanding, i.e. there was a slightly stronger response to the intention question than the understanding question. Compared to the workshop I ran in Oxford in January 2015, the shift in behaviour was more dramatic, although the two groups were drawn from different research areas so can’t be directly compared.

CECAM Macromolecular simulation software workshop

I’m co-organiser of this slightly-different CECAM workshop in October 2015 at the Forschungszentrum Jülich, Germany. Rather than following the traditional format of 3-4 day populated by talks with the odd poster session, this is an extended workshop made up of six mini-workshops. Since it is focussed on python-based tools for biomolecular simulations, of which there are an increasing number, the first mini-workshop will be a Software Carpentry bootcamp that I will be lead instructor on (helped by David Dotson from ASU). I’m also leading the next mini-workshop on analysing biomolecular simulation data.

Running GROMACS on an AMD GPU using OpenCL

I first used an Apple Mac when I was eight. Apart from a brief period in the 1990s when I had a PC laptop I’ve used them ever since.

Until last year I had an old MacPro which had four PCI slots so you could add a GPU-capable NVIDIA card, although you were limited by the power supply. A GPU can accelerate the molecular dynamics code I use, GROMACS, by up to 2-3 times.

Unfortunately, when Apple designed the new MacPro, they put in AMD FirePro GPUs so although it is a lovely machine, you can’t run CUDA applications.

But this morning I saw that the next release candidate of GROMACS 5.1 supported OpenCL. Although OpenCL applications are usually a bit slower than CUDA applications, this would, in theory, allow me to accelerate GROMACS on my MacPro.

So I downloaded the code, compiled it with the appropriate OpenCL flag and it just works! I benchmarked the code on an atomistic and a coarse-grained benchmark that I use. Running on a single core, using a single AMD FirePro D300 accelerated GROMACS by 2.0 and 2.5x for the atomistic and coarse-grained benchmarks, respectively.

Here’s looking forward to the final release of GROMACS 5.1!fig-gromacs-5.1-amd

New Publication: Alchembed

In much of my research I’ve looked at how proteins embedded in cell membranes behave. An important part in any simulation of a membrane protein is, obviously, putting it into a model membrane, often a square patch of several hundred lipid molecules. This is surprisingly difficult: although a slew of methods have been published, none of them can embed several proteins simultaneously into a complex (non-flat) arrangement of lipids. For example, a virus, as shown in our recent paper.

Here we introduce a new method, dubbed Alchembed, that uses an alternative way, borrowed from free energy calculations, of “turning on” the van der Waals interactions between the protein and the rest of the system. We show how it can be used to embed five different proteins into a model vesicle on a standard workstation. If you want to try it out, there is a tutorial on GitHub. This assumes you have GROMACS is setup

 

You can get the paper for free from here.

Is Software a Method?

Last month I went to the Annual Meeting of the US Biophysical Society. As a Software Sustainability Institute fellow I was interested not only in my research area, but also in how my community viewed software. Were there talks and posters on how people had improved important pieces of community software? After all, there would be talks and posters on improving experimental methods. Turns out, not so much. Click here to read the full post.

 

HackDay: Data on Acid

Every year the Software Sustainability Institute (SSI) run a brilliant meeting called the Collaborations Workshop, usually in Oxford. This is an unconference lasting two days. At first glance it doesn’t look like it would be relevant to my research, but I always learn something new, meet interesting people and start, well, collaborations. The latest edition was last week and was the fourth I’ve attended. (Disclaimer: for the last year-and-a-bit I’ve been an SSI fellow which has been very useful – this is how I managed to train up to be a Software Carpentry Instructor. Alas my tenure has now ended).

For the last two years the workshop has been followed by a hackday which I’ve attended. Now I’m not a software developer, I’m a research scientist who uses million-line community-developed codes (like GROMACS and NAMD), but I do write code, often python, to analyse my simulations and also to automate my workflows. A hackday therefore, where many of the participants are research software engineers, pushes me clear out of my comfort zone. I remember last year trying to write python to access GitHub using its API and thinking “I’ve never done anything like this before and I’ve no idea what to do.”. This year was no different, except I’d pitched the idea so felt responsible for the success of the project.

The name of the project, Data on Acid, was suggested by Boris Adryan and the team comprised myself, Robert Haines, Alys Brett, Joe Parker and Ian Emsley. The input was data produced by a proof of principle project I’ve run to test if I can predict whether individual mutations to S.aureus DHFR cause resistance to trimethoprim. The idea was to then turn it into abstract forms, either visual or sound, so you can get an intuitive feel for the data. Or it could just be aesthetic.

To cut a long story short, we did it, it is up on GitHub and we came third in the competition! In the long term I’d like to develop it further and incorporate it into my volunteer crowd-sourced project, bashthebug, that aims to predict whether bacterial mutations cause antibiotic resistance or not (when it is funded that is).

Installing GROMACS with MPI support on a Mac

GROMACS is an optimised molecular dynamics code, primarily used for simulating the behaviour of proteins. To compile GROMACS you need, well, some compilers. I install gcc using MacPorts. Note that this requires you to first install Xcode. Then it is easy to install gcc version 4.9 by

sudo port install gcc49

(and yes, I know about Homebrew, but I still find MacPorts has more of the things I want than brew). So, once you’ve done a bit of preparation, compiling vanilla GROMACS from source on a Mac is easy. Once you’ve downloaded the source code tar ball.

tar xvf gromacs-5.0.2.tar.gz
cd gromacs-5.0.2
mkdir build
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DCMAKE_INSTALL_PREFIX='/usr/local/gromacs/5.0.2/‘
make
sudo make install

Note that this will install it in /usr/local/gromacs/5.0.2 so you can keep multiple versions on the same machine and swap between them in a sane way by sourcing the GMRXC file, for example

source /usr/local/gromacs/4.6.7/bin/GMXRC

Adding MPI support on a Mac is trickier. This appears mainly to be because the gcc compilers from MacPorts (or clang from Xcode) don’t appear to support OpenMPI. You will know because when you run the cmake command you get a load of failures starting about ten lines down, such as

-- Performing Test OpenMP_FLAG_DETECTED - Failure

I managed to get a working version using the following approach; it is likely there are better (if you know, please leave a comment), but it has the virtue of working. First we need to install OpenMPI.

sudo port install openmpi

Now we need a compiler that supports OpenMPI. If you dig around in the MacPorts tree you can find some.

sudo port install openmpi-devel-gcc49

Finally, we can follow the steps above (I just mkdir build-mpi subfolder in the above source folder and then cd to it), but now we need a (slightly) complex cmake instruction

cmake .. -DGMX_BUILD_OWN_FFTW=ON
-DGMX_BUILD_MDRUN_ONLY=on
-DCMAKE_INSTALL_PREFIX=/usr/local/gromacs/5.0.2
-DGMX_MPI=ON -DCMAKE_C_COMPILER=mpicc-openmpi-devel-gcc49
-DCMAKE_CXX_COMPILER=mpicxx-openmpi-devel-gcc49
-DGMX_SIMD=SSE4.1

This is only going to build an MPI version of mdrun (which makes sense) and will install mdrun_mpi alongside the regular compiled binaries we did first. We have to tell cmake what all the new fancy compilers are called and, unfortunately, these don’t support AVX SIMD instructions so we have to fall back to SSE4.1. Experience suggests this doesn’t impact performance as much as you might think. Now you can run things like Hamiltonian replica exchange on your workstation!

A simple tutorial on analysing membrane protein simulations.

I’m teaching a short tutorial on how to analyse membrane protein simulations next week at the University of Bristol as part of a series arranged by CCPBioSim. As it is only 90 minutes long, it only covers two simple tasks but I show how you can do both with MDAnalysis (a python module) or in Tcl in VMD. Rather than write something and just distribute it to the people who are coming to the course, I’ve put the whole tutorial, including trajectory files and all the example code here on Github. Please feel free to clone it, make changes and send a pull request (or just send me any comments).

Getting an ext3 Drobo 5D to play nicely with Ubuntu 12.04

Our lab has recently bought two Drobo 5Ds to give us some large storage. They work out of the box with Macs but getting them to play nicely with Linux, specifically Ubuntu 12.04, has been a bit more work so I thought I’d share the recipe that, for us at least, appears to work. Much of this has been cobbled together from the drobo-utils page and also from a very helpful earlier blog post. One thing I could not get to work, unfortunately, is USB3. There appeared to be problems with USB3 and Linux when I was trying this out. Finally I should mention that the Drobo here was setup on a Mac, so was formatted HFS+ to begin with and, of course, follow these commands at your own risk. They worked for me, but they might not work for you..

 

First plug the Drobo into the power and connect with the USB lead to your Ubuntu machine. Don’t use any blue USB ports – these are USB3 and I couldn’t get them to work with the Drobo. After a while the Drobo should appear as a USB disk drive in a window. You can check what Ubuntu is doing by looking at this log

$ dmesg | tail

It will show something like

[250886.772714] usb 1-1.1: new high-speed USB device number 10 using ehci_hcd
[250887.331458] scsi19 : usb-storage 1-1.1:1.0
[250888.328628] scsi 19:0:0:0: Direct-Access Drobo 5D 5.00 PQ: 0 ANSI: 0
[250888.329605] sd 19:0:0:0: Attached scsi generic sg3 type 0
[250888.330168] sd 19:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).

First we need to intall the latest version of the linux Drobo tools, so we will probably need git and let’s get QT as well so we can check the GUI.

$ sudo apt-get install git
$ sudo apt-get install python-qt4

Now cd to somewhere where you put packages etc and run

$ git clone git://drobo-utils.git.sourceforge.net/gitroot/drobo-utils/drobo-utils

This will download all the files and binaries you need

$ cd drobo-utils/

Just check it is all up to date

$ git pull

Check it is all working by seeing if this works (warning: this can take about a minute)

$ sudo ./drobom status

In theory, we can bring up the GUI as below, but on my machine I just got python errors about KeyError: 'UseStaticIPAddress'. Check it if you want.

$ sudo ./drobom view

Next we need to know which device the Drobo is currently plugged into. This will probably change everytime you plug the Drobo in.

$ ls -lrt /dev/disk/by-uuid/

There should be a long alphanumeric list that I will call foo that is pointing to something like /dev/sdb. The foo should match the foo when I type

$ ls /media/

If so, then we know that the Drobo is connected to /dev/sdb. Next we need to set the Logical Unit Size (LUNS). This is the largest volume the Drobo will appear as, and if we run a df it will show this as the physical size of the Drobo even if there are not enough disks inside to make it this size. Since the Drobo 5D has five slots and we are using 4TB disks at present, then if we run with single disk redundancy the maximum size is 16 TB. You could make this smaller but then you would have multiple “drobo partitions” mounted all pointing to the same machine. The disadvantage with a large LUNS is it means the startup time is long, as is any disk checking time. The units in the line below are TB! Caution these commands can take a while to run and I’ve not pasted in the usual “are you sure?” prompts.

$ sudo ./drobom set lunsize 16 PleaseEraseMyData

Now we need to setup a partition for the disk using parted which should be already installed. This has its own command line. Although we are setting up an ext3 disk, it seems ext3 is just ext2 with journalling, so we ask parted for an ext2 disk.

$ sudo parted /dev/sdb
GNU Parted 1.8.9
Using /dev/sdd
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel
New disk label type? gpt
(parted) mkpart ext2 0 100%
(parted) quit
Information: You may need to update /etc/fstab.

Now we need to format the disk. Again remember to use the right device. Also note it is sdb1 since we are formatting the first and only partition, not the disk itself. Also note again we are formatting as ext2 but with the -j flag for journalling, hence ext3. Again, this will ask whether you are sure etc and could take a few hours.

$ sudo mke2fs -j -i 262144 -L Drobo -m 0 -O sparse_super,^resize_inode /dev/sdb1

Nearly there. If you remount the Drobo it should appear in /Media/Drobo (or whatever name you gave it above) Now we need to make sure you have permissions to write to the disk. For this we need to know your user and group numeric ids.

$ id
uid=9009(fowler) gid=100 groups=100

So my user id is 9009 and my group id is 100. Hence

$ sudo chown -R 9009:100 /media/Drobo/

If we want to mount the Drobo somewhere else, we need to edit /etc/fstab. First we need to know the UUID of the disk (this was the foo).

$ ls -lrt /dev/disk/by-uuid/

Copy the foo into the clipboard and open

$ sudo emacs /etc/fstab

Add a line at the end that looks like

# mount the ext3 Drobo
UUID=b278aff6-db1a-436b-995b-8808c2c82f9e /drobo1 ext3 defaults 0 2

make sure the mount point exists!

$ sudo mkdir /drobo1

Now remount the disk, either by rebooting or by issuing

$ sudo mount -a

and voila, you should find the disk by

$ ls /drobo1/

Check you can make a file

$ touch /drobo1/hello-world.txt

Check it appears in your list of disks using df etc. I’ve checked you can be a bit rough with it e.g. just pulling out the USB cable and then reconnecting to a different port. Seemed ok but I did need to remount it using

$ sudo mount -a

and then I could add and edit files as normal and there was no complaining in dmesg about write-only filesystems or anything like before with HFS+.