Usually, the protein that an antibiotic binds is essential for bacterial survival, which is how the drug has its effect. In this case, relatively few protein mutations arise that confer resistance, they are often subtle in nature and one can try to predict the phenotype of a protein mutation by considering how it affects the binding free energy of an antibiotic.
Resistance to pyrazinamide (PZA), which is a first-line anti-tuberculosis compound, mainly arises via genetic variation in the pncA gene, which, unusually, is not essential in M. tuberculosis. One finds a wide range of genetic variation in clinical samples, from missense mutations to insertions and deletions and even the insertion of stop codons. This makes building a catalogue that specifies the effect of each genetic variant on the action of PZA more challenging since one has to classify many more variants. A current leading resistance catalogue specifies the effect of over 450 pncA single nucleotide polymorphisms yet even that level of detail only allows a prediction to be made for 75% of clinical samples.
In this preprint, Josh Carter has applied several Machine Learning methods to a curated, high-quality set of pncA mutations and, by including a range of structural and chemical features, is able to predict the effect of pncA missense mutations to a good degree of sensitivity and specificity. One application of this model would be to provide a preliminary classification for the 25% of clinical samples that the heuristic catalogues cannot make a prediction.