Compression and crystallograpic 2D images

Since the apparition of area detectors, the storage of the frames resulting of the experiment has always been troublesome. The apparition of bigger and faster detectors increased the problem where a lots of big images are produced.

Several methods exist to store data, from uncompressed data to very fancy compression algorithms. CBFlib contains at least two compression schemes: the CCP4-style and byte offset compression. Manufacturers also use compression scheme for their files. For example, CrysalisPro from Agilent use a bit field compression (non documented) scheme in their universal esperanto format. Dectris, in their Pilatus detector, uses Byte offset compression in the minicbf file format. See also read/write cbf algorithm in fortran.

All compression schemes give fairly good results. The byte offset algorithm also has the advantage of a huge dynamic range up to 64bits. However, general compression tools widely available like gzip, bzip2, lzma, xz and so on give similar or better results which make me wonder why spending so much time on this…

Below is a comparison between some compression schemes.

  • Uncompressed data are 32bits integers plus a 6.25KB header (uncompressed esperanto).
  • The byte offset compression is used as in the cbf files from dectris.
  • The bit field method is from compressed esperanto files. “dc imgimpexp” command in CrysalisPro 171.37.33 minimum.
  • bzip2 files were just made using the uncompressed esperanto file with no argument.
File Uncompressed Byte offset (cbf) Esperanto bitfield bzip2
a 1.5MB 305.6KB 109.3KB 25.3KB
b 1.5MB 305.5KB 291.4KB 239.6KB
c 1.5MB 305.7KB 220.6KB 167.3KB

Minimum size storage in cbf is 1 Byte versus 1 bit in the bitfield compression. This explain why for very low noise it performs much better with the latter (difference from previous pixel is zero or one most of the cases). I think sfrm from bruker is using 1 byte size plus overflow tables for 2, 4 and maybe 8 bytes size. Result should be similar to the byte offset compression. In any case, a simple bzip2 compression outperform all the clever crystallographic algorithm and widely available.

Leave a Reply