feat: support fpzip and kempressed codecs #391

william-silversmith · 2022-05-17T22:23:38Z

Hi Jeremy,

Our automated segmentation pipeline operates on a different principle than Google BRAIN's FFN which I understands produces direct segmentation from the output of the network. We produce an voxel affinity map from a boundary detector which is later post-processed into a segmentation through an "agglomeration" step.

These data are float32 and 3 channel, so 12x larger than the original image uncompressed. For petascale inference, this became expensive to store, so in 2018/2019 we investigated alternative compression algorithms. We found that the fpzip lossless compression algorithm for floating point data. Nico Kemnitz did some experimentation that exploited the fact that our affinities are in the range 0 - 1 which adds 2 to the data and switches the Z and Channel axes to get higher compression. Overall, a 2x to 3x improvement in compression is achieved, making the large scale storage of affinities temporarily viable instead of impossible. A table can be seen here: https://github.com/seung-lab/cloud-volume/wiki/Advanced-Topic:-fpzip-and-kempressed-Encodings

Unfortunately, we haven't been able to visualize this data easily without decompressing it or quantizing it, which leads to underutilization of this codec.

This PR adds "fpzip" and "kempressed" encoding support to Neuroglancer using the fpzip-1.3.0 library (https://github.com/LLNL/fpzip). CloudVolume supports fpzip and kempression via Python bindings (https://github.com/seung-lab/fpzip).

I experimented with different em++ settings to optimize size. -Oz produces a reasonable binary of about 37 KB. -O3 is closer to 1MB but may be faster.

The fpzip library is BSD licensed since 1.3.0.

Thanks for your consideration Jeremy!

jbms · 2022-05-17T23:33:30Z

src/neuroglancer/sliceview/fpzip/fpzip_wasm.cpp

+  size_t get_nf() { return nf; }
+
+  void decode_headers(unsigned char *data) {
+		FPZ* fpz = fpzip_read_from_buffer(static_cast<void*>(data));


This seems to be a strange/broken API provided by fpzip, since it accepts data but does not know the number of bytes available.

Please investigate what the correct bounds check is and add it.

I checked the header file and the missing num_bytes appears to be correct. https://github.com/LLNL/fpzip/blob/develop/include/fpzip.h#L192-L196

Reading the code, it seems the minimum byte stream (for the header) seems to be at least 6 x uint32_t.

Previously, I found though that a minimum of 28 bytes are required, though I am not sure of why the last 4 are. I'll add that as a check.

Actually I see that the actual decoding even after reading the header also does no bounds checking. I don't think there is any safe way to use this API as is. The upstream repository needs to be fixed to include proper bounds checking.

Okay, I'm addressing this with the upstream maintainer. Will revisit this PR probably in a few days or weeks.

jbms · 2022-05-17T23:36:07Z

Now that there are a number of wasm modules in Neuroglancer, and given that esbuild still doesn't support code splitting for non-es module workers, it would be good to change the bundling to load the wasm files separately rather than embedding them as data: urls. That way users only download the modules they actually need.

william-silversmith · 2022-05-18T01:46:58Z

I'll investigate ESBuild some more, but what would you recommend as a good place to get started?

jbms · 2022-05-18T02:10:41Z

I'll investigate ESBuild some more, but what would you recommend as a good place to get started?

It may be as simple as changing 'dataurl' here to 'file':

neuroglancer/config/esbuild.js

Line 152 in 5f62034

loader: {'.wasm': 'dataurl'},

https://esbuild.github.io/content-types/#external-file

william-silversmith · 2022-08-05T02:00:53Z

Interesting, I just tried this (sorry it took so long) and it appeared to split the wasm fetches into separate requests successfully. I was able to visualize a compresso encoded volume. However, the wasm modules were not lazy loaded. It looks like code splitting is implemented for esm but it will take some time for me to understand the implications there for wasm.

jbms · 2022-08-05T18:03:33Z

The reason they aren't being lazy loaded is that we have e.g. compressModulePromise in sliceview/compresso/index.ts created at global scope which causes the wasm to be fetched as soon as the containing bundle is loaded. Instead it would need to change so that the promise is only created the first time it is needed.

william-silversmith · 2022-08-05T18:15:08Z

Ah, I thought that might be the case. If I have a moment, I can try seeing how to change it to be on-demand.

william-silversmith added 10 commits May 17, 2022 00:18

feat: first stab at an fpzip decoder

272b989

fix: messy but working for fpzip

c63cec8

perf: faster fpzip decoding (-O3)

6fdbcf5

feat: add check_valid function to C api

b152a0a

refactor: cleanup cpp file

87dc2d7

feat: support dekempression in typescript codec

49f2eef

feat: decode kempressed data

9a6c877

refactor: use memcpy instead of loops

ae83fac

fix: check for valid byte lengths

f3dc80c

chore: update build of libfpzip.wasm

e3ee15a

jbms reviewed May 17, 2022

View reviewed changes

william-silversmith added 2 commits May 24, 2022 14:25

Merge branch 'google:master' into wms_fpzip

1e71487

Merge branch 'google:master' into wms_fpzip

40ce0f8

william-silversmith added 2 commits August 29, 2022 21:05

Merge branch 'google:master' into wms_fpzip

b019148

Merge branch 'google:master' into wms_fpzip

fbb4637

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support fpzip and kempressed codecs #391

feat: support fpzip and kempressed codecs #391

william-silversmith commented May 17, 2022 •

edited

Loading

jbms May 17, 2022

william-silversmith May 18, 2022

william-silversmith May 18, 2022

jbms May 18, 2022

william-silversmith May 18, 2022

jbms commented May 17, 2022

william-silversmith commented May 18, 2022

jbms commented May 18, 2022

william-silversmith commented Aug 5, 2022

jbms commented Aug 5, 2022

william-silversmith commented Aug 5, 2022

feat: support fpzip and kempressed codecs #391

Are you sure you want to change the base?

feat: support fpzip and kempressed codecs #391

Conversation

william-silversmith commented May 17, 2022 • edited Loading

jbms May 17, 2022

Choose a reason for hiding this comment

william-silversmith May 18, 2022

Choose a reason for hiding this comment

william-silversmith May 18, 2022

Choose a reason for hiding this comment

jbms May 18, 2022

Choose a reason for hiding this comment

william-silversmith May 18, 2022

Choose a reason for hiding this comment

jbms commented May 17, 2022

william-silversmith commented May 18, 2022

jbms commented May 18, 2022

william-silversmith commented Aug 5, 2022

jbms commented Aug 5, 2022

william-silversmith commented Aug 5, 2022

william-silversmith commented May 17, 2022 •

edited

Loading