New preprint: Predicting pyrazinamide resistance in M. tuberculosis using a graph convolutional network Philip Fowler, 29th October 202530th October 2025 In previous work we’ve used “traditional” machine-learning approaches, like XGBoost, to learn and therefore predict which mutations in PncA confer resistance to pyrazinamide, one of the four first-line antibiotics used to treat tuberculosis. A key limitation is that because the data are presented in a tabular form, one in effect learns mutation-by-mutation rather than allele-by-allele. We can get away with this in M. tuberculosis because the genetic variation is very low, but even then we have to discard alleles with multiple mutations. Another limitation is we cannot give the model all the information embedded in the protein structure and instead have to collapse it down to simple features like distance from the binding site. Here, Dylan Dissanayake has, for the first time, trained a graph convolutional network (GCN) on exactly the same Train/Test dataset as above and gets comparable performance. That might not seen worth the effort, but we were surprised, given its complexity and number of parameters, that a GCN did this well on what is a comparatively small dataset with little variation. This suggests that applying GCNs to other pathogens such as E. coli where the “allelic explosion” in e.g. beta-lactamase genes becomes an advantage for a GCN rather than a limiting factor in AMR prediction. We have learnt a lot on GCNs from Joe Morrone who is Dylan’s industrial supervisor and is based at IBM Research at Yorktown Heights in New York State. This has all been made possible because Dylan is part of the IBM Computational Discovery DPhil programme here at Oxford. Joe is, of course, an author on the manuscript. In brief, the protein structure of each allele is predicted with AlphaFold2 and this is then used to build a graph where the nodes are the amino acids and they are connected with edges if they are spatially proximal. Each node then has a vector containing a range of chemical and structural features e.g. molecular weight, number of hydrogen bond donors, type of protein secondary structure etc. This is then passed into the GCN which has in this case three layers before pooling to produce a final classification. Share this: Share on X (Opens in new window) X Share on Bluesky (Opens in new window) Bluesky Email a link to a friend (Opens in new window) Email Share on LinkedIn (Opens in new window) LinkedIn Share on Mastodon (Opens in new window) Mastodon Related antimicrobial resistance clinical microbiology computing GPUs group publication research tuberculosis
computing Software Carpentry Workshop, Jülich, 12-13 October 2015 23rd October 2015 Last week, myself and David Dotson from ASU, ran a 2 day Software Carpentry workshop… Share this: Share on X (Opens in new window) X Share on Bluesky (Opens in new window) Bluesky Email a link to a friend (Opens in new window) Email Share on LinkedIn (Opens in new window) LinkedIn Share on Mastodon (Opens in new window) Mastodon Read More
New refereed preprint: BashTheBug 31st March 202231st March 2022 BashTheBug is a citizen science project hosted on the Zooniverse platform that we launched in… Share this: Share on X (Opens in new window) X Share on Bluesky (Opens in new window) Bluesky Email a link to a friend (Opens in new window) Email Share on LinkedIn (Opens in new window) LinkedIn Share on Mastodon (Opens in new window) Mastodon Read More
antimicrobial resistance New publication: Assessing Drug Susceptibility in Tuberculosis 28th September 201829th September 2018 A paper was published in the New England Journal of Medicine earlier this week by… Share this: Share on X (Opens in new window) X Share on Bluesky (Opens in new window) Bluesky Email a link to a friend (Opens in new window) Email Share on LinkedIn (Opens in new window) LinkedIn Share on Mastodon (Opens in new window) Mastodon Read More