Skip to content
Fowler Lab
Fowler Lab

Predicting antibiotic resistance de novo

  • News
  • Research
    • Overview
    • Manifesto
    • Software
    • Reproducibility
    • Publications
  • Members
  • Teaching
  • Contact
    • PhDs
  • Wiki
Fowler Lab
Fowler Lab

Predicting antibiotic resistance de novo

Compression FASTA files natively in Python

Philip Fowler, 23rd May 201926th May 2019

The M. tuberculosis genome is pretty small, only 4.4 million nucleotides, so storing all that as plaintext means each genome is 4.2MB, but when you have tens of thousands of genomes it starts to add up, particularly as I want to keep my data tree on my workstation so I can view the images produced by AMyGDA, some of which are then fed to BashTheBug. I’ve always thought it neat that in Python you can write and read compressed text files “on the fly” using gzip or bzip2, so how do they perform?

Both accept a compressionlevel argument that runs from 1 to 9 and tells the algorithm how hard to try and compress the text. How does that affect the time taken to compress a TB genome?

I’d expected some kind of linearity, but neither algorithm behaves that way on this data at least: bzip2 seems to take about the same time whatever the setting is (these data were gathered using the %timeit magic so are the mean of multiple repeats) whereas gzip suddenly slows down once you go past a compression level of 6.

What effect does that have on the achieved compression?

For bzip2, no. The same level of compression is achieved whatever the setting is. For gzip, to a point. There is a point of diminishing returns once you go past a compression level of 5 or 6, after which you are just slowing your code down and wasting electricity.

I was expecting bzip2 to ‘win’ but I’ve ended up concluding that using gzip with a very low compression level (1 or 2) is a good compromise as it is very fast and you get most, but not all, of the compression you could otherwise get.

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Bluesky (Opens in new window) Bluesky
  • Click to email a link to a friend (Opens in new window) Email
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on Mastodon (Opens in new window) Mastodon

Related

computing

Post navigation

Previous post
Next post

Related Posts

computing

GROMACS 4.6

18th October 201323rd September 2018

GROMACS is a scientific code designed to simulate the dynamics of small boxes of stuff, that…

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Bluesky (Opens in new window) Bluesky
  • Click to email a link to a friend (Opens in new window) Email
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on Mastodon (Opens in new window) Mastodon
Read More
computing

Getting an ext3 Drobo 5D to play nicely with Ubuntu 12.04

25th June 2014

Our lab has recently bought two Drobo 5Ds to give us some large storage. They work…

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Bluesky (Opens in new window) Bluesky
  • Click to email a link to a friend (Opens in new window) Email
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on Mastodon (Opens in new window) Mastodon
Read More
antimicrobial resistance

AMyGDA now available from GitHub

27th January 202027th January 2020

AMyGDA is a python module that analyses photographs of 96-well plates and, by examining each…

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Bluesky (Opens in new window) Bluesky
  • Click to email a link to a friend (Opens in new window) Email
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on Mastodon (Opens in new window) Mastodon
Read More

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
    ©2025 Fowler Lab | WordPress Theme by SuperbThemes
     

    Loading Comments...