New preprint: Predicting pyrazinamide resistance in M. tuberculosis using a graph convolutional network Philip Fowler, 29th October 202530th October 2025 In previous work we’ve used “traditional” machine-learning approaches, like XGBoost, to learn and therefore predict which mutations in PncA confer resistance to pyrazinamide, one of the four first-line antibiotics used to treat tuberculosis. A key limitation is that because the data are presented in a tabular form, one in effect learns mutation-by-mutation rather than allele-by-allele. We can get away with this in M. tuberculosis because the genetic variation is very low, but even then we have to discard alleles with multiple mutations. Another limitation is we cannot give the model all the information embedded in the protein structure and instead have to collapse it down to simple features like distance from the binding site. Here, Dylan Dissanayake has, for the first time, trained a graph convolutional network (GCN) on exactly the same Train/Test dataset as above and gets comparable performance. That might not seen worth the effort, but we were surprised, given its complexity and number of parameters, that a GCN did this well on what is a comparatively small dataset with little variation. This suggests that applying GCNs to other pathogens such as E. coli where the “allelic explosion” in e.g. beta-lactamase genes becomes an advantage for a GCN rather than a limiting factor in AMR prediction. We have learnt a lot on GCNs from Joe Morrone who is Dylan’s industrial supervisor and is based at IBM Research at Yorktown Heights in New York State. This has all been made possible because Dylan is part of the IBM Computational Discovery DPhil programme here at Oxford. Joe is, of course, an author on the manuscript. In brief, the protein structure of each allele is predicted with AlphaFold2 and this is then used to build a graph where the nodes are the amino acids and they are connected with edges if they are spatially proximal. Each node then has a vector containing a range of chemical and structural features e.g. molecular weight, number of hydrogen bond donors, type of protein secondary structure etc. This is then passed into the GCN which has in this case three layers before pooling to produce a final classification. Share this: Click to share on X (Opens in new window) X Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Mastodon (Opens in new window) Mastodon Related antimicrobial resistance clinical microbiology computing GPUs group publication research tuberculosis
SARS-CoV-2 pipeline live on EIT Pathogena 28th January 202528th January 2025 Back in the SARS-CoV-2 pandemic we worked closely with ORACLE Corp to build and deploy… Share this: Click to share on X (Opens in new window) X Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Mastodon (Opens in new window) Mastodon Read More
computing Is Software a Method? 1st April 201523rd September 2018 Last month I went to the Annual Meeting of the US Biophysical Society. As a… Share this: Click to share on X (Opens in new window) X Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Mastodon (Opens in new window) Mastodon Read More
antimicrobial resistance BioExcel Alchemical Free Energy workshop 17th June 2019 Last month I was invited to give a talk on using alchemical free energy methods… Share this: Click to share on X (Opens in new window) X Click to share on Bluesky (Opens in new window) Bluesky Click to email a link to a friend (Opens in new window) Email Click to share on LinkedIn (Opens in new window) LinkedIn Click to share on Mastodon (Opens in new window) Mastodon Read More