Women in Computer Science Day

Last week I ran a small stall at the annual Women in Computer Science day run by the Department of Computer Science at the University of Oxford. Fortunately being neither a woman nor a computer scientist proved to be a problem. The event was aimed at female Year 10 students (and therefore would be choosing their A-levels next academic year). The UK over-hauled its computer science curriculum a few years ago, and it was great to see that more than half of the students had done some Python which is awesome. (I am teaching my eight year old Python). I was demonstrating the image processing software I’ve written (in Python) that detects growth of M. tuberculosis on a 96 well plate (see the photo above) and can also automatically cut up the photos into strips of single drugs that are then uploaded to BashTheBug, a Zooniverse citizen science project. An important, and really easy message, was therefore it is amazing what you can do with Python! No need to learn some esoteric programming language, like Ook!. And yes, there are often times when you do need the performance of a lower-level language, but often, you don’t. As my own career path shows, nor do you have to study Computer Science to do science with computers.

As a Software Carpentry instructor, the thought that first-year undergraduates (or at least UK ones) might be turning up at registration happy using Python is great – in a way Software Carpentry is one of those organisations whose ultimate goal should be to bring about its own irrelevance – and makes me think of how much more imaginatively we could teach. It could bring experimentation into theory; if you don’t believe that the Binomial distribution can be approximated by a Gaussian for big enough numbers, try it! Or, if you don’t believe that integrating under a simple quadratic is a cubic, try it using many, many small trapeziums (my maths class was set this and a group of us wrote BASIC programs that hummed overnight on our Ataris and Commodores as we tried to outdo each other on the number of trapeziums the interval was divided up into. Geeky, I know.).

If then couple that with a computing cloud, you can start to say things like: “Ok, let’s all now try assembling a human genome”. How much more inspiring is that?

Software Carpentry Workshop, Oxford, 9-10 January 2017

Earlier this week I instructed the first Software Carpentry workshop run by the Reproducible Research Oxford project. This is a one-year project supported by the IT Innovation Challenges Fund and the Social Sciences Division. It is led by Laura Fortunato and I’m a member of the project team. One of the main aims of the project is to embed Software and Data Carpentry within the University of Oxford. The workshop was held at The Oxford Research Centre for Humanities and we had around 25 learners from a wide range of backgrounds, ranging from clinical medicine to the Bodlean library.

There are only a handful of qualified instructors within the University – my co-instructor was Iain Emsley who you can see at the right in the photo above – and so one of the main objectives of the project is to train additional Software and Data Carpentry instructors to allow workshops like these to become self-sustaining. For me, one of the key defining characteristics of Software Carpentry is that “everyone was a learner once”, in other words, yesterday’s learner is today’s helper who is tomorrow’s instructor. So creating a core of instructors, along with running a series of workshops will, we hope, bootstrap the process within the University.

If you read the About page on the Software Carpentry website it says

Since 1998, Software Carpentry has been teaching researchers in science, engineering, medicine, and related disciplines the computing skills they need to get more done in less time and with less pain

I don’t agree with this; I think researchers in the Humanities and Social Sciences have just as much, if not more, to gain from learning some of the skills we teach. Reaching academics from these disciplines will give this project a unique “Oxford flavour” and I anticipate may move us to develop bespoke material, much as has been done by Data Carpentry.

DTC Bioinformatics Module – Hackathon!

Last month I organised the Bioinformatics Module for the Oxford Interdisciplinary Bioscience Doctoral Training Partnership – this immediately followed the DTC Programming Module, which I also taught part of. This was the first year I’ve organised this three-week module and was fortunate that we had five teams of lecturers from the previous year who were all committed to teaching a one- or two-day course on a specific part of bioinformatics. So part of the job was making sure we had a handbook, a schedule, and even, a logo.

The main change I introduced was to replace the project in the third week with a Hackathon. Most of my experience of these has been through the excellent Collaborations Workshops run by the Software Sustainability Institute in the UK, although last year I ran my first hackathon as part of a CECAM meeting in Jülich, Germany.

I am a big fan of Hackathons as you work within a small, self-assembled team to produce something useful, often in Python, and stored in GitHub. Because they are deliberately very open-ended and unconstrained, you usually proposed something that is way too hard to do in the time available and it feels really uncomfortable as you don’t know how to start or help your team. At some point, hopefully before halfway, you begin to see that you’ll be able to craft something using the mix of skills and experience in your team and you start to feel better. Afterwards, I usually realise it was a very intense experience and something I’ll remember for a long time.

We deliberately set up the course so that everyone would have read and presented a paper that could lead into a Hackathon project, and all the lecturers also suggested some possible projects (some also provided some initial genomic data). As a teacher, running a Hackathon is scary precisely because so much is left to the students. Will they engage or will it fall flat on its face? Given a majority of them hadn’t done any coding prior to the preceding Programming Module would it be too much too fast? As is traditional, I had some prizes, thanks to the generous support of Microsoft Research, GitHub and Oxford Nanopore. I sent emails to all three companies about a month before the start of the Hackathon thinking I’d be lucky to get some stickers from one company and all three replied and sent me some amazing prizes. This really helped create a buzz on the first day and helped the students engage.

One of the prizes donated by Microsoft.

The first obstacle was forming teams; in my experience having four or five people in a team is optimal. Any more and it starts to fragment, any fewer and you have too much to do (although three can work if you have the right mix of skills). We ended up with teams of 3,4,4 and 6. The large team ended up splitting with each sub-team working on a separate task before joining back together again before the final day.

Overall, I was amazed how well it went; all the students got very involved. One team pretended to be stuck on the moon and had to use a Minion to work out who was infected with a specific plasmid. The acid test was that they have no lectures or practicals scheduled for Wednesdays, yet when I came in on Thursday, it transpired they’d spent all Wednesday working in their Hackathon teams. I asked some specific questions about what they thought about the Hackathon, the results of which are below.

Student feedback

As you can see, it was broadly positive. I asked “What did you enjoy most about the course?” and half the students said the hackathon. So, although it requires more preparation and thought, and is more nerve-wracking as a teacher, I think collaborative open-ended projects like this are the future. We all ranked the projects and the winners were “Small-But-Perfectly-Formed” (see the feature image) with a protein-protein docking and molecular dynamics study, which was very impressive given the short time they had. Thanks to Phillip Stansfeld (Biochemistry) who reserved part of his computing cluster for them to run the molecular dynamics simulations.

The winners: Steven Fiddaman, Robert Dixon and Oliver Adams

DTC Programming Module – Feedback

Last month I finished lecturing part of the Programming Course that all the Doctoral Training Centre DPhil students do at the start of their first year. It is the first time I’ve helped teach the course so I thought I’d record some of the feedback I collected here. Overall the course introduces C and Python simultaneously, this helps illustrate the relative strengths and weaknesses of each language, but inevitably this is more challenging than concentrating on a single language, especially for beginners. Since the teaching rooms are not big enough, we split the cohort into two groups; I took the group with less programming experience (although there were definitely some ringers in the audience). In keeping with this, two-thirds of the 28 respondents had not done any form of programming before. Encouragingly, everyone thought that programming is essential for biology in the 21st century. So did they like the course?

I enjoyed the course

Overall, yes, 73% agreed that they enjoyed the course and 69% found the lectures engaging and interesting. Jolly good, but what specifically did they like? A clear majority (83%) found the practicals helped to embed the concepts introduced during the lectures,

The practicals helped me understand the concepts introduced during the lecture.

whilst an identical proportion thought the comics (mainly xkcd) were not a waste of time.

I thought the comics were a waste of time.

I borrowed a few ideas from Software Carpentry and, since my lecture room has dual projection, did live coding on one screen with the slides on the other. Slightly fewer people, 72%, but still a good majority, liked this.

I liked the lecturer doing “live coding”.

Since the course was run in a conventional computing laboratory, I also encouraged, where possible, students to try some programming on their own laptop by installing Anaconda or Enthought Canopy, as I believe that it is only when you can play with coding in your own time on a machine you are comfortable with that you’ll begin to really get it. I was pleasantly surprised to see 49% of the students had indeed done this, which is high given they don’t yet all have laptops. With any course like this, however, it is whether their intentions have changed that is key. Given two-thirds had never coded before, it was encouraging to see that 82% intended to use Python during their DPhil.

I intend using Python during my DPhil.

This contrasted sharply with C: only 6%, which is only two people, stated that they intended to use C.

I intend using C during my DPhil.

This probably partially reflects my own preference for “Python where possible, only use C if you have to do.” and, as was reflected in the comments, people tended to either ask that we teach Python, then teach C or perhaps not teach C at all. With hindsight, it is not surprising that teaching two quite different languages simultaneously to a group of students, most of whom had never done any programming before, was likely to be challenging. Some of the other comments were

Do not do Python and C at the same time.

Having Python & C side-by-side was just confusing – would be better to try and become proficient in one (Python!) only

Learning two languages simultaneously is very difficult.

Just teach PYTHON

Finally, the aim of this course can’t be to teach programming in two weeks, but hopefully will make them familiar enough that should they encounter a problem during their DPhil that they try and solve it themselves and don’t feel daunted by this prospect. Overall 59% thought they knew enough to write a simple problem, with 28% still not sure.

I know enough to write a simple program.

Lots to think about, and hopefully, feed back into the course for next year.

DTC Programming Course

Advanced Resources for the Curious


Installing python on your laptop

A good guide is to follow SciPy’s instructions. On my Mac I install python and its modules using MacPorts. For Windows I’d use Enthought Canopy or Anaconda. These work on Mac and Linux as well and give you python + a broad distribution of modules, including numpy and scipy.

Free Geeky Resources

GitHub do a great Student Developer Pack which includes some free Amazon Web Services time and also a GitHub account with unlimited repositories.

Linux tips

  • TAB autocomplete
  • use the up and down ARROW keys to cycle through previous commands
  • HISTORY displays your previous commands. Each has an associated number e.g. 534.


Will re-run that command

2017 PhD projects advertised

If you are interested in helping combat antibiotic resistance and want to work on an interdisciplinary computational project with the possibility of strong public engagement (through bashthebug.net), please check out this project

These are part of the annual Nuffield Department of Medicine Prize Studentship competition at the University of Oxford and are fully funded four-year studentships open to any nationality. This is how to apply, but feel free to contact me informally with any questions. The deadline is noon, 6th January 2017. Alternatively, if you are part of the Doctoral Training Centre, or have your own funding, and are interested in putting together your own bespoke project, please drop me a line.

Within the Modernising Medical Microbiology group, these studentships are also being advertised.

Cheltenham Science Festival

A bit over a week ago I helped run the Modernising Medical Microbiology stall at the Cheltenham Science Festival. This was my first time helping explain about antibiotic resistance to, well, anyone and everyone. As I come from a molecular background and we didn’t have any information about protein structure, I thought I’d put together something explaining how mutations in the bacterial genome can prevent an antibiotic from binding to its target protein, thereby giving rise to resistance.How protein mutations can give rise to antibiotic resistance

The chosen medium for this was Lego (DUPLO to be precise.). I wanted to be able to let younger children play with the DUPLO and then I could use the LEGO models to show it to older children (and adults). This is shown above.

Then, for older children, we could move onto looking at a real protein structure with an antibiotic bound (using the same colours as the LEGO to make it obvious). Under the hood this is VMD but I coded a simplified GUI to make it easier to use. Using the surface representation as shown, by turning the antibiotic on and off, you can clearly see how well it fits in the protein and so how a small change would be sufficient to disrupt the binding.


Overall it went well and we had a constant stream of people at our stall. What struck me was how some of the children were genuinely fascinated; I even turned around at one point to find a 9 year old rotating the protein on my laptop. You could talk to kids like this and (try to) explain concepts way beyond the national curriculum (like atomic theory and molecules). We had some mini GiantMicrobes – the “superbug” MRSA with its cape was a favourite. If you gave one of these to the kids who were very interested they loved it, and, I hope, may have lit the touch paper for an interest in science.

Analysing Simulation Data CECAM Workshop, Jülich, 14-15 October 2015

IMG_0987 (1)

This two day workshop on Analysing Simulation Data was part of the larger CECAM Macromolecular Simulation Software Workshop at the Forschnungzentrum, Jülich that I co-organised. It was the second workshop and immediately followed an introductory Software Carpentry workshop.

Prior to a few years ago I analysed all my simulation data using either VMD, often by writing Tcl scripts, or using a GROMACS g_tools, if one was available. Then I started using MDAnalysis, a python module. This enabled me to do two things: first MDAnalysis has its own analysis routines and therefore you could often do all the analysis you needed to in a simple python script. More powerfully, since it can read many different simulation formats, it can also act as a gatekeeper to the huge range of powerful python modules. The net result is I have been able to analyse my data in ways that previously would not have been possible (i.e. I would have had to write C code.)

For example we presented a paper (open access) at a Faraday Discussion meeting where we used the image-processing tools in scikt-image to analyse whether the presence of a small cell-signalling protein retarded the rate at which a three-component lipid bilayer phase separated. I posted some example code on GitHub. In other work, not all published, I have used scipy and numpy to, for example, calculate the power spectra of fluctuations in lipid bilayers (using fast fourier transforms).


The workshop brought together researchers, especially PhD students and postdoctoral researchers, and academic software developers.

The hope was that the researchers would come out of it feeling not only more confident about developing their own software and maybe even start contributing to an academic open source project but also that they could use the python ecosystem to analyse their data in new and interesting ways.

For the developers the hope was they would get to talk to a range of current and prospective users and gain a better understanding of how people are using their code (and maybe pick up some contributors along the way)



I felt that a traditional didactic approach wouldn’t work; so no sessions of talks + questions. In the end I stole shamelessly from the excellent series of Collaborations Workshops run by the Software Sustainability Institute in the UK. The workshop worked towards and cumulated in a HackDay. I now believe HackDays are great ways of not only teaching but also as a way of building teams — I am writing a post of this for the SSI and will link to it here when it is up.

IMG_0988 (1)I invited developers from two biomolecular python projects: MDAnalysis and pmx. Given more budget I would have loved to invite other developers, e.g. from mdtraj. On the first day each project gave a short talk followed by around two hours of guided tutorial. Then at the end of day one, I invited participants to present analysis problems drawn from their own research that they would like to solve. Teams were allowed to form around six ideas. On day two these teams had around six hours to solve their problem, before presenting their solution to the rest of the workshop. The winning project, MDARTINI, aimed to make MDAnalysis more aware of the coarse-grained forcefield, MARTINI.



  • 94% of participants enjoyed the workshop,
  • 100% learnt something useful that will help their research
  • 100% would recommend a workshop like this one to other researchers.
  • 88% feel confident enough to contribute to an academic open source project.

“I now understand enough to try using the following tools”

I then asked “I now understand enough to try using the following tools”. Given most participants had heard of MDAnalysis, but only a few had used it – and very few had heard of pmx – this is an encouraging shift. This was then followed up by: “I intend using the tools and methods to help my research”. Usually, the answers are a bit more pessimistic as people might understand a tool, but not have any intention of using it. Here, though it goes the other way.


“I intend using the tools and methods to help my research.”

To try and understand which parts of the workshop went well: “I enjoyed the following components of the workshop”. So, talks were ok, then the HackDay but the tutorials and meeting other researchers were most highly rated.


“I enjoyed the following components of the workshop.”

Finally, to find out if there were any practical problems I asked “The following elements contributed to making the workshop a success”.


“The following elements contributed to making the workshop a success.”

The big problem here was the network; we had better connectivity in the small hotel in Jülich. It turned out there was a problem with the wireless router in the room and this was fixed a few days after this workshop. Nor did many people like the location in Jülich, however the various coffee breaks – which we were grateful to the Software Sustainability Institute for sponsoring – and the general social atmosphere were appreciated.



Lessons for next time

This type of workshop is very complicated and plenty can go wrong. Always have a Plan B. For example, assume that not everyone will be able to install all the necessary software on their laptops so come prepared with a (linux) virtual machine image that will work in all the tutorials. And don’t assume that the network will “just work”.

Software Carpentry Workshop, Jülich, 12-13 October 2015

IMG_0944 (1)Last week, myself and David Dotson from
ASU, ran a 2 day Software Carpentry workshop to kick off the CECAM Macromolecular Simulation Software Workshop at the Forschnungzentrum, Jülich. The idea was to give participants who were less well versed in python and working collaboratively with e.g. git a crash course to bring them up to speed for the following five mini-workshops. As you can imagine, coffee and tea are essential for running an intensive bootcamp and we owe thanks to The Software Sustainability Institute for sponsoring our coffee breaks.

As we were a self-organised workshop, there was no centrally coordinated surveying of the participants to gauge their level of experience. So instead I sent out a questionnaire very similar to one I’d previously sent before the first workshop I organised back in 2012. As is often the case, the learners were more comfortable with bash and simple python, but hadn’t heard or used testing or version control. Interestingly, compared to this previous workshop a higher proportion of learners were experienced in bash and python. Both groups were drawn from the bimolecular simulation community so this may reflect an increasing level of expertise.

fig-pre-expertiseThe workshop itself was the smoothest I’ve been involved in; I think it helped that both myself and David have taught several now. Also, devoting three hours for each of bash and version control and then six hours for python (including coffee breaks) meant it wasn’t quite as rushed. The last workshop I taught was in January 2015 and the course materials have been overhauled and updated and separated from the workshop GitHub repository. The latest version of the materials seemed to work well.
It also meant I was unfamiliar with the evolution of ipython notebooks into jupyter notebooks which David used to teach. Interestingly, although there was only one helper, Charlie Laughton, we were never overwhelmed. At each workshop I have taught or organised the ratio of helpers to learners decreased, which may reflect improvements in installation and the course materials.
Finally, I was live coding on my Mac laptop and using the new Split View in Mac OS 10.11 worked really well.

That is what I thought: what about the learners? I had fifteen
responfig-post-understand-enoughses to the questionnaire which was about a two-thirds response rate. All of them agreed with the statements “I enjoyed the Software Carpentry workshop” and “I feel I learnt something useful that will help my research”, but as we know, enjoyment does not necessarily translate into learning! As before I asked two key questions. First “I now understand enough to try using the following tools/approaches.”. As you can see there is a big shift in attitude compared to before the workshop with the majority of people feeling that they understood the tools covered during the workshop.

fig-post-intend-usingBut will this translate into a change in behaviour? To try and test this I also asked
“I intend using the tools listed below to help my research”. The results are pretty similar but interestingly peoples intentions were stronger than their understanding, i.e. there was a slightly stronger response to the intention question than the understanding question. Compared to the workshop I ran in Oxford in January 2015, the shift in behaviour was more dramatic, although the two groups were drawn from different research areas so can’t be directly compared.

CECAM Macromolecular simulation software workshop

I’m co-organiser of this slightly-different CECAM workshop in October 2015 at the Forschungszentrum Jülich, Germany. Rather than following the traditional format of 3-4 day populated by talks with the odd poster session, this is an extended workshop made up of six mini-workshops. Since it is focussed on python-based tools for biomolecular simulations, of which there are an increasing number, the first mini-workshop will be a Software Carpentry bootcamp that I will be lead instructor on (helped by David Dotson from ASU). I’m also leading the next mini-workshop on analysing biomolecular simulation data.