New preprint: Predicting pyrazinamide resistance in M. tuberculosis using a graph convolutional network Philip Fowler, 29th October 202530th October 2025 In previous work we’ve used “traditional” machine-learning approaches, like XGBoost, to learn and therefore predict which mutations in PncA confer resistance to pyrazinamide, one of the four first-line antibiotics used to treat tuberculosis. A key limitation is that because the data are presented in a tabular form, one in effect learns mutation-by-mutation rather than allele-by-allele. We can get away with this in M. tuberculosis because the genetic variation is very low, but even then we have to discard alleles with multiple mutations. Another limitation is we cannot give the model all the information embedded in the protein structure and instead have to collapse it down to simple features like distance from the binding site. Here, Dylan Dissanayake has, for the first time, trained a graph convolutional network (GCN) on exactly the same Train/Test dataset as above and gets comparable performance. That might not seen worth the effort, but we were surprised, given its complexity and number of parameters, that a GCN did this well on what is a comparatively small dataset with little variation. This suggests that applying GCNs to other pathogens such as E. coli where the “allelic explosion” in e.g. beta-lactamase genes becomes an advantage for a GCN rather than a limiting factor in AMR prediction. We have learnt a lot on GCNs from Joe Morrone who is Dylan’s industrial supervisor and is based at IBM Research at Yorktown Heights in New York State. This has all been made possible because Dylan is part of the IBM Computational Discovery DPhil programme here at Oxford. Joe is, of course, an author on the manuscript. In brief, the protein structure of each allele is predicted with AlphaFold2 and this is then used to build a graph where the nodes are the amino acids and they are connected with edges if they are spatially proximal. Each node then has a vector containing a range of chemical and structural features e.g. molecular weight, number of hydrogen bond donors, type of protein secondary structure etc. This is then passed into the GCN which has in this case three layers before pooling to produce a final classification. Share this: Share on X (Opens in new window) X Share on Bluesky (Opens in new window) Bluesky Email a link to a friend (Opens in new window) Email Share on LinkedIn (Opens in new window) LinkedIn Share on Mastodon (Opens in new window) Mastodon Related antimicrobial resistance clinical microbiology computing GPUs group publication research tuberculosis
publication New Publication: Effect of SAO mutation on Band 3 12th January 201729th September 2018 There is a lovely story behind this paper just published earlier this week in Biochemistry…. Share this: Share on X (Opens in new window) X Share on Bluesky (Opens in new window) Bluesky Email a link to a friend (Opens in new window) Email Share on LinkedIn (Opens in new window) LinkedIn Share on Mastodon (Opens in new window) Mastodon Read More
New preprint: automatically building a better bedaquiline catalogue 31st January 202531st January 2025 A catalogue recording whether individual mutations confer resistance or not to specified antibiotics is a… Share this: Share on X (Opens in new window) X Share on Bluesky (Opens in new window) Bluesky Email a link to a friend (Opens in new window) Email Share on LinkedIn (Opens in new window) LinkedIn Share on Mastodon (Opens in new window) Mastodon Read More
DPhil in Computational Discovery 23rd January 202323rd January 2023 I have a project advertised as part of the DPhil in Computational Discovery programme at… Share this: Share on X (Opens in new window) X Share on Bluesky (Opens in new window) Bluesky Email a link to a friend (Opens in new window) Email Share on LinkedIn (Opens in new window) LinkedIn Share on Mastodon (Opens in new window) Mastodon Read More