Skip to content

Set of plugins for the HDF5 library

License

Notifications You must be signed in to change notification settings

fpwg/hdf5plugin

 
 

Repository files navigation

hdf5plugin

This module provides HDF5 compression filters (namely: blosc, bitshuffle and lz4) and registers them to the HDF5 library used by h5py.

Supported platforms are: Linux, Windows, macOS.

Whenever possible, HDF5 compression filter plugins are best installed system-wide or through Anaconda (blosc-hdf5-plugin, hdf5-lz4). Yet, hdf5plugin provides a generic way to enable h5py with the provided HDF5 compression filters.

The HDF5 plugin sources were obtained from:

Installation

To install, just run:

pip install hdf5plugin

To install locally, run:

pip install hdf5plugin --user

Documentation

To use it, just use import hdf5plugin and supported compression filters are available from h5py.

Sample code:

import numpy
import h5py
import hdf5plugin

# Compression
f = h5py.File('test.h5', 'w')
f.create_dataset('data', data=numpy.arange(100), compression=hdf5plugin.LZ4)
f.close()

# Decompression
f = h5py.File('test.h5', 'r')
data = f['data'][()]
f.close()

hdf5plugin provides:

  • The HDF5 filter ID of embedded plugins:
    • hdf5plugin.BLOSC
    • hdf5plugin.BSHUF
    • hdf5plugin.LZ4
  • Compression option helpers (See Compression options):
    • hdf5plugin.BSHUF_LZ4_OPTS: bitshuffle filter options for default block size and LZ4 compression.
    • hdf5plugin.blosc_options(level=5, shuffle='byte', compression='blosclz'): Function to prepare compression_opts parameter to use with blosc compression.
  • hdf5plugin.FILTERS: A dictionary mapping provided filters to their ID
  • hdf5plugin.PLUGINS_PATH: The directory where the provided filters library are stored.

Compression options

Compression filters can be configured with the compression_opts argument of h5py.Group.create_dataset method by providing a tuple of integers.

The meaning of those integers is filter dependent and is described below.

bitshuffle

compression_opts: (block_size, lz4 compression)

  • block size: Number of elements (not bytes) per block. It MUST be a mulitple of 8. Default: 0 for a block size of about 8 kB.
  • lz4 compression: 0: disabled (default), 2: enabled.

By default the filter uses bitshuffle, but do NOT compress with LZ4.

Example: Dataset compressed with bitshuffle+LZ4

f = h5py.File('test.h5', 'w')
f.create_dataset('bitshuffle_with_lz4',
                 data=numpy.arange(100),
                 compression=hdf5plugin.BSHUF,
                 compression_opts=(0, 2))  # or hdf5plugin.BSHUF_LZ4_OPTS
f.close()

blosc

compression_opts: (0, 0, 0, 0, compression level, shuffle, compression)

  • First 4 values are reserved.
  • compression level: From 0 (no compression) to 9 (maximum compression). Default: 5.
  • shuffle: Shuffle filter:
    • 0: no shuffle
    • 1: byte shuffle
    • 2: bit shuffle
  • compression: The compressor blosc ID:
    • 0: blosclz (default)
    • 1: lz4
    • 2: lz4hc
    • 3: snappy (not available in hdf5plugin)
    • 4: zlib
    • 5: zstd

By default the filter uses byte shuffle and blosclz.

Example: Dataset compressed with bitshuffle+lz4

f = h5py.File('test.h5', 'w')
f.create_dataset(
    'data',
    data=numpy.arange(100),
    compression=hdf5plugin.BLOSC,
    compression_opts=hdf5plugin.blosc_options(
        shuffle='bit', compression='lz4'))
    # or compression_opts=(0, 0, 0, 0, 9, 2, 1)
f.close()

lz4

compression_opts: (block_size,)

  • block size: Number of bytes per block. Default 0 for a block size of 1GB. It MUST be < 1.9 GB.

Dependencies

Testing

To run self-contained tests, from Python:

import hdf5plugin.test
hdf5plugin.test.run_tests()

Or, from the command line:

python -m hdf5plugin.test

To also run tests relying on actual HDF5 files, run from the source directory:

python test/test.py

This tests the installed version of hdf5plugin.

License

The source code of hdf5plugin itself is licensed under the MIT license. Use it at your own risk. See LICENSE

The source code of the embedded HDF5 filter plugin libraries is licensed under different open-source licenses. Please read the different licenses:

The HDF5 v1.10.5 headers (and Windows .lib file) used to build the filters are stored for convenience in the repository. The license is available here: src/hdf5/COPYING.

About

Set of plugins for the HDF5 library

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 93.3%
  • C++ 2.5%
  • Python 2.5%
  • CMake 0.9%
  • Objective-C 0.3%
  • Makefile 0.3%
  • Other 0.2%