The Modernising Medical Microbiology (MMM) group in Oxford, of which I am a part, is pioneering genetics-based clinical microbiology. The central idea is to infer which antibiotics can be used to treat an infection by examining the mutations in the genome and looking up their effect in a catalogue of previously-seen cases. To achieve this goal (1) accurately mapping which mutations confer resistance through large-scale genomic sampling projects and (2) developing predictive methods that can deal with novel or rare mutation.

Alternatively, predictive methods could be used in the development of new antibiotics (or the modification of existing ones) to determine how many mutations allow the bacteria to escape the action of the drug. Minimising this number should, I hope, prolong the lifespan of an antibiotic.

The CRyPTIC project, which is led by the MMM group, is collecting between 30,000 and 50,000 clinical samples of M. tuberculosis over the next few years. Each sample will have its whole genome sequenced and its susceptibility to 14 different anti-TB drugs tested using a 96-well microtitre plate. If a mutation in a key gene is repeatedly associated with resistance to a specific drug that cannot be explained any other way, then we can infer that this mutation confers resistance. Identifying these signals requires the errors in both datasets to be minimised. Determining whether bacteria (here, M. tuberculosis) is growing in a small well is a subjective and difficult classification task. Whilst experts are experienced and can apply their reasoning to what they see, in practice only a single expert may view each plate making systematic errors likely.

To help with this problem I am developing software to automatically read the 96-well plates and have also launched a Citizen Science project, BashTheBug, that generates a consensus view of each plate.

1. Automatic Mycobacteria Growth Detection Algorithm (AMyGDA)

I am developing some image processing software, AMyGDA, that can automatically detect the growth of mycobacteria in these images. The key advantage, of course, is that the software is consistent and, whilst it can be confused by artefacts, these are known effects. An example is shown below – AMyGDA has marked each well where it has detected growth.

You can read more about AMyGDA in this post, or in this biorXiv preprint or you can download the software here.


As part of the AMyGDA process the locations of all the wells in each image are identified. This makes it straightforward to “cut up” each image into a series of wells, each of which has a single antibiotic in increasing concentration. In April 2017 I launched a Citizen Science project, called BashTheBug on the Zooniverse platform, inviting anyone to help us classify which wells have M. tuberculosis growing, and which do not. The project has been highly successful, and after six months, over 8,350 Citizen Scientists have done over 540,000 classifications. In August 2017 the project won the Online Community award in the NIHR Let’s Get Digital Competition and has seen been featured by BBC Radio and AAAS Science Update. Below is a screenshot but the best way to understand how the project works is to give it a go.


3. Predicting antibiotic resistance

My third and most ambitious project is to develop computational methods able to predict antibiotic resistance by considering the structures of the target protein, the antibiotic and how they dynamically interact  with one another.

My hypothesis is that mutations in an open-reading frame confer resistance by reducing how well the antibiotic binds to its target protein, whilst not altering how well the natural substrate binds. Prediction is therefore a matter of determining how the binding free energies of both molecules change upon introduction of the protein mutation. Calculating small molecule-protein binding free energies is a mature field. Probably the best-known approach is computational docking which uses a simple heuristic functional to estimate how well a small molecule binds in a specific orientation to a protein. Whilst fast, these methods take no account of the dynamics of either protein or drug are not accurate for a problem of this subtlety. Instead I am applying an alchemical method, a class of methods derived from statistical mechanics over sixty years ago. This requires, however, 4-5 orders of magnitude more computational resource than simple computational docking and, as a result, these types of method have only recently begun to find application outside of theoretical physical chemistry.

This research will also benefit from the large amount of genomic and drug susceptibility data being collected by the CRyPTIC project. I am collaborating with Derrick Crook, Tim Peto, Sarah Walker and other members of the CRyPTIC project.

My preliminary study examined the effect of mutations on the binding of trimethoprim, an antibiotic, to DHFR, an essential protein in S.aureus. I chose seven mutations that were identified by a previous study which sequenced the genomes of a S.aureus infections from two hospitals in the UK. Three of these cause resistance (coloured red above) and four have no effect on the action of the antibiotic (coloured green).

Using sophisticated alchemical free energy methods I was able to not only predict which mutations caused resistance and which did not, but also get good quantitative agreement with experimental measurements of how the binding free energy changes upon mutation (for the F99Y mutant) and also with measured minimum inhibitory concentrations. This work is currently under review and I will update this text when it is published.

These methods work by calculating the work required to change one amino acid side-chain into another as shown in the movie below. The movie zooms in on Leu41 and shows how first the cost of removing the partial electrical charges are calculated, before the sidechain is “alchemically” change to a phenylalanine and lastly, the partial electrical charges are added back.

This technique is derived from classical statistical mechanics and has been known for over sixty years, however, it is only comparatively recently that computers have got fast enough to allow us to try using these approaches on real-world problems, like antibiotic resistance.