Skip to content

Advanced Topic: fpzip and kempressed Encodings

William Silversmith edited this page Aug 4, 2022 · 5 revisions

How to Use

In your Neuroglancer info file, set the "encoding" field for a given mip level to "fpzip" to use the raw fpzip compression algorithm or to "kempressed" to use Kempression. Unfortunately, these codecs are only supported by CloudVolume at the moment.

Offical Neuroglancer support is forthcoming, but you can visualize fpzip and kempressed encodings using this Neuroglancer branch: https://github.com/william-silversmith/neuroglancer/tree/wms_fpzip

Example info File:

{
  "type": "image",
  "data_type": "float32",
  "num_channels": 3,
  "scales": [{
      "chunk_sizes": [[ 256, 256, 16 ]],
      "encoding": "fpzip",
      "key": "4_4_40",
      "resolution": [ 4, 4, 40 ],
      "size": [ 80000, 60000, 1890 ],
      "voxel_offset": [ 0, 0, 0 ]
  }]
}

Example using CloudVolume

from cloudvolume import CloudVolume

info = cloudvolume.CloudVolume.create_new_info(
    num_channels    = 3,
    layer_type      = 'image',
    data_type       = 'float32', 
    encoding        = 'kempressed', # or fpzip
    resolution      = [4, 4, 40], # Voxel scaling, units are in nanometers
    voxel_offset    = [0, 0, 0], # x,y,z offset in voxels from the origin
    chunk_size      = [ 128, 128, 64 ], # units are voxels
    volume_size     = [ 250000, 250000, 25000 ], # e.g. a cubic millimeter dataset
)

vol = CloudVolume(..., info=info)
vol.commit_info()

The Problem

In some connectomics segmentation pipelines, an important intermediate step is to generate voxel pair affinities in the X, Y, and Z dimensions. These affinities are represented as float32s, meaning that compared with the original image, often a single dimension of uint8s, the affinities are 12x as large. Unfortunately, we have found that they do not compress sufficiently with gzip.

Fpzip

Into this void steps fpzip. It's a fast lossless compression algorithm for multi-dimensional floating point data developed by Peter Lindstrom et al at LLNL.

Kempression

As non-FIBSEM connectomics datasets are highly anisotropic (often at ratios between 5:1 to 10:1 for the Z axis), Nico Kemnitz found that reorganizing the data from XYZC to XYCZ would group more similar data near each other. He also found that as our data were all between 0 to 1, by adding 2.0f to all data, it was possible to set the exponents of all the floating point data to the same value at the cost of a machine epsilon of precision.

"Kempressed" data consist of these two manipulations plus fpzip compression.

Implementation

The fpzip codec uses the C++ code written by the fpzip authors. We added a Cython interface (fpzip.pyx) and a Python extension compilation toolchain to enable its use in CloudVolume. Unfortuantely, C and C++ extensions are not very well supported by the Python ecosystem, so our extension requires installing numpy prior to cloud-volume installation. If you do not have numpy pre-installed, fpzip compilation will be skipped as the numpy header files are required to compile our wrapper.

Performance

The following data were compiled by Kemnitz over 100 runs on a 256x256x16x3 connectomics dataset (1.17 GiB) imaged at 4x4x40 nm.

Manipulation Codec Encoding (s) Compression (s) Decompression (s) Decoding (s) Total (MiB) Ratio
None gzip -6 0 91.1 12.31 0 779.57 64.96%
+2.0 gzip -6 0.29 82.6 14.97 0.29 674 56.17%
+2.0 & XYAZ gzip -6 0.51 81.05 12.09 0.51 674.05 56.17%
None zstd -14 0 112.02 5.04 0 709.96 59.16%
+2.0 zstd -14 0.29 117.4 4.73 0.29 605.35 50.45%
+2.0 & XYAZ zstd -14 0.51 114.4 5.12 0.51 603.75 50.31%
None fpzip 0 19.13 29.19 0 561.49 46.79%
+2.0 fpzip 0.29 16.32 21.21 0.29 458.56 38.21%
+2.0 & XYAZ fpzip 0.51 14.3 18.67 0.51 395.04 32.92%