New software: pygsi Philip Fowler, 31st August 2018 Whenever a paper involving sequencing the genome of bacteria (or other species for that matter), the researcher is obliged to deposit the (usually short reads) in either the European Nucleotide Archive (ENA) and the Short Read Archive (SRA) along with some metadata. Sounds good, but there has been a flaw until recently; whilst one could deposit the short-read files, one could only search the associated metadata. This meant that, say you wanted to search the ENA for samples containing MCR-1, an important recently identified gene that confers colistin resistance, if it wasn’t explicitly mentioned in the metadata (and most of the time it wouldn’t have been as it wouldn’t have been identified yet!), you’d have had to download all the possible short read files and then trawl through them. In other words, the ENA and SRA were archives; easy to put data into, difficult to search and interrogate. Zam Iqbal and his group have developed an index for all the bacterial and viral pathogen genetic data in the ENA/SRA as of late 2017 which is searchable. It is called BIGSI and you can try it here (the resemblance to an early Google is not, I suspect, a coincidence) and you can find the preprint here. Doesn’t seem like much, but suddenly we can ask all sorts of interesting questions. Like: how many samples contain MCR-1? One problem is when we are looking for a gene we are usually looking for the reference sequence and associated minor variants (e.g. couple of SNP differences). With the current BIGSI interface this is hard, since you’d have to systematically give it all possible variants of your base k-mer. Fortunately, systematically is something computers are good at, so as a hack (because ultimately I imagine something like BIGSI will become a service at the EBI and this sort of functionality will be included), I wrote a Python package that takes a gene and then walks along the sequence and asking BIGSI how many times each minor variant occurs. Since each variant requires a web API call, it isn’t rapid, but you can work through a single gene overnight. The package, including a more detailed description and examples, can be downloaded from its GitHub repository. Share this: Share on X (Opens in new window) X Share on Bluesky (Opens in new window) Bluesky Email a link to a friend (Opens in new window) Email Share on LinkedIn (Opens in new window) LinkedIn Share on Mastodon (Opens in new window) Mastodon Related antimicrobial resistance computing
antimicrobial resistance GPAS 17th May 202113th October 2021 I’ve been working on this for the last few months and very happy that we… Share this: Share on X (Opens in new window) X Share on Bluesky (Opens in new window) Bluesky Email a link to a friend (Opens in new window) Email Share on LinkedIn (Opens in new window) LinkedIn Share on Mastodon (Opens in new window) Mastodon Read More
antimicrobial resistance New paper: a deep learning model that reads MICs from images of 96 well plates 26th May 20251st July 2025 Our paper describing how a convolutional neural network model can determine the minimum inhibitory concentrations… Share this: Share on X (Opens in new window) X Share on Bluesky (Opens in new window) Bluesky Email a link to a friend (Opens in new window) Email Share on LinkedIn (Opens in new window) LinkedIn Share on Mastodon (Opens in new window) Mastodon Read More
antimicrobial resistance New preprint: predicting rifampicin resistance 16th August 202416th August 2024 In this preprint we train a series of machine learning models on protein mutations found… Share this: Share on X (Opens in new window) X Share on Bluesky (Opens in new window) Bluesky Email a link to a friend (Opens in new window) Email Share on LinkedIn (Opens in new window) LinkedIn Share on Mastodon (Opens in new window) Mastodon Read More