-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: compressed block table sizes #45
Comments
It turns out that looking closer at the previous spreadsheet (titled "png and zlib statistics") shows that 68% of PNG files have a single Zlib block (and therefore wouldn't benefit from changing table sizes dynamically/at-runtime). I took the 902 such files and run Some notes about and based on that spreadsheet:
One thing that I don't fully understand is why I am getting much smaller gains when trying the same things through Chromium-based benchmarks (rather than through Next steps and open questions:
|
Quick correction of my previous comments: I previously wrote that "weighing by frame_info_buffer_size makes sense". I take that back. I've rerun tests on ~900 single-zlib-block PNGs (from top 500 websites, gathered in 2023) but recording the decoding time (rather than throughput delta). I've also recorded (see the ad-hoc tool here and here):
The spreadsheet with my results can be found here. Some notes:
My plan for the next step is to put together an
Making this function In the long-term we should try to collectively decide whether to:
I think we don't necessarily need to make this decision in the short-term. I think that Chromium can carry a small patch that changes the table sizes for Chromium experiments. And we could even have Chromium pick the table sizes through a field trials parameter (monomorphizing both options into the binary, but picking one based on a field trial parameter). |
I've realized that in the initial version I have incorrectly counted doubles (including codes 256 and higher). I've edited the spreadsheet to fix this. This doesn't change the overall results AFAICT. |
I'd suggest trying table sizes between 512 and 4096. It turns out that zlib-ng uses 10-bit tables (i.e. 1024 entries) and libdeflate uses 11-bit tables (2048 entries). In fact, Chromium switched from 9-bits to 10-bits a while back for its zlib decompressor. Another trick that may be helpful is that you can actually get an estimate of the size of a deflate block based on the length of the longest symbol: Huffman coding attempts to assign an n-bit code to a symbol that occurs with frequency 1/2^n and the end-of-block symbol occurs once per block, so if it is assigned a n-bit code then block will likely contain roughly 2^n symbols (or more if the maximum length is used). |
Ack. I may indeed want to re-measure more table sizes with a bigger corpus. But just as a quick clarification, the last spreadsheet did cover 512/128, 1024/512, 2048/512, and 4096/512 table sizes - the
Thank you for these pointers. I didn't have these in focus. Let me try skimming over https://dougallj.wordpress.com/2022/08/20/ and making quick notes about the proposed optimizations:
Good point. I tried to weigh the doubles/singles/colds count by symbol frequencies, but unfortunately it didn't really result in higher correlation coefficients. Maybe this is because we don't have a good estimate of the number of codewords in the input ( |
BTW, https://lib.rs/cargo-show-asm is like a local godbolt that can show any function. Much easier than reproducing fragments of code online. |
FWIW I've tried gathering data for a bigger corpus:
The results are in a spreadsheet here:
Given mixed results, I am not sure if it's worth pursuing this direction further. Still, I think that we may want to enable continued experimentation in the future, by merging #49 |
Hello!
I just wanted to share some data and thoughts that stem from my experiments with using different tables sizes in
CompressedBlock
. This is very exploratory/preliminary and I hope that we can discuss various ideas and directions before we commit to any particular solution (a random internet blogpost tells me that this may result in a better outcome :-P).One experiment I've done, is using the default table sizes (4096/512) when the estimated image bytes (width x height x samples-per-pixel x bits-per-sample / 8) is above a certain threshold, but using smaller tables (512/128) otherwise (see the Chromium CL here). The results I've got (see section "Measurements 2024-12-30 - 2025-01-02" in my doc) have been positive, but the magnitude of the improvement has been disappointing to me.
The results above were also surprisingly flat - I've expected that small images will significantly benefit from small tables (and big images from big/default tables). One hypothesis that could explain that is that image size is not a good predictor for the size of zlib compressed blocks - e.g. maybe some big images can use lots of relatively short compressed-zlib-blocks. So I tried another experiment to gather this kind of data on my old 2023 corpus of ~1650 PNG images from top 500 websites (see also the tool bits here and here) - the results can be found in a spreadsheet here. I think the following bits of data are interesting:
I also think that it is a bit icky that in my experiments the public API of
fdeflate
"leaks" the implementation detail of Huffman table sizes. One idea to avoid this is to:CompressedBlock
andfn read_compressed
fromDecompressor
, so thatDecompressor
can internally choose to use small or big table sizes (with dynamic dispatch via something likeBox<dyn CompressedBlockRead[er]>
). I think that movingfn read_compressed
toimpl...CompressedBlock
can be made easier by packaging/encapsulating bits ofDecompressor
(to make it easier to pass them as&mut
reference tofn read_compressed
) - for example maybebuffer
+nbits
can become fields ofBitBuffer
struct
andqueued_rle
+queued_backref
can become variants ofenum QueuedOutput
.fdeflate::Decompressor::set_output_size_estimate(estimate: usize)
which can be used to decide the initial table sizes. (Note thatpng::ZlibStream
already has such estimate available - it calls itmax_total_output
.)The text was updated successfully, but these errors were encountered: