Color index array as FIFO #94

phoboslab · 2021-12-19T14:15:24Z

As proposed by @nigeltao and advocated for in #48, the fifo-branch implements the color index as a FIFO, instead of using a hash. It is up to implementers how to search through this FIFO in the encoder. The naive approach would be a linear search. The implementation in the fifo-branch uses a secondary hash table.

master:

A running array[64] (zero-initialized) of previously seen pixel values is 
maintained by the encoder and decoder. Each pixel that is seen by the encoder
and decoder is put into this array at the position formed by a hash function of
the color value. In the encoder, if the pixel value at the index matches the
current pixel, this index position is written to the stream as QOI_OP_INDEX. 
The hash function for the index is:
	
	index_position = (r * 3 + g * 5 + b * 7 + a * 11) % 64

fifo:

A running array[64] (zero-initialized) of pixel values is maintained by the
encoder and decoder. Every pixel en-/decoded by the QOI_OP_DIFF, QOI_OP_LUMA, 
QOI_OP_RGB and QOI_OP_RGBA chunks is written to this array. The write position
starts at 0 and is incremented with each pixel written. The position wraps back
to 0 when it reaches 64. I.e:

	index[index_pos % 64] = current_pixel;
	index_pos = index_pos + 1;

An encoder can search this array for the current pixel value and, if a match is
found, emit a QOI_OP_INDEX with the position within the array.

Pros:

not having to dictate a specific hash function. Implementers can search the index in whichever way suits their needs. Probably nice for hardware implementations or platforms that have trouble with multiplications. In practice I doubt that it matters.
slight speedup in the decoder (245mpps vs. 224mpps on my machine; funny enough this is still slower than using lookup tables for the color hash)
makes the decoder simpler

Cons:

somewhat harder to explain. We have to be careful to specify when exactly the color index is written to and at which position
puts more burden on the implementation to make the encoder fast, instead of being fast by default
makes the encoder more complicated

Compression ratio with the FIFO is only slightly better in total (even when using an exhaustive linear search). Slightly worse with some particular images.

I'm still undecided. In theory it's cool to not specify the hash function, but in practice we don't gain much here and (imho) make it more complicated. I'm curious to hear if there are any objections, specifically from other implementers. @pfusik, @MasterQ32?

The text was updated successfully, but these errors were encountered:

chocolate42 · 2021-12-19T15:17:32Z

Using hash_pos during encode is a neat solution because AFAICT it's scalable, if encoder memory is very tight hash_pos can be some lower multiple of index size which only mildly decreases efficiency due to more collisions in hash_pos. Some higher multiple seems fine for normal use, and maximal compressors can always use linear search if they wish. Eliminating the hash entirely from the decode path seems very beneficial, IMO it's far more critical that the decode path be as simple as possible.

pfusik · 2021-12-19T15:33:19Z

On x86, x * 3 and x * 5 correspond to single LEA instructions. No need for lookup tables.

I like the simpler and faster decoder and the option to trade encoding time for improved compression ratio. As I understand, FIFO can have better compression ratio for every possible input?

Is your FIFO encoder slower? How much?

phoboslab · 2021-12-19T16:04:07Z

As I understand, FIFO can have better compression ratio for every possible input?

No. Even with exhaustive search, some images encode slightly worse compared to master. I assume this is because it's beneficial for these images to refer to "older" pixels, further back then 64.

(lfifo is linear search, hfifo is the current hash impl in the fifo-branch)

## Total for images/textures_photo
        decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi.lfifo:    6.0        30.2        173.32         34.73      1984   64.6%
qoi.hfifo:    6.0         9.0        174.84        115.99      1985   64.6%
qoi.master:   6.6         9.4        158.82        111.93      1981   64.5%

In the majority of cases, fifo compresses better, though. Sometimes substantially.

## Total for images/textures_pk
        decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi.lfifo:    0.3         1.2        165.44         37.53        72   54.0%
qoi.hfifo:    0.3         0.4        163.83        107.48        73   54.7%
qoi.master:   0.3         0.4        147.57        104.48        75   56.5%

Totals:

# Grand total for images
        decode ms   encode ms   decode mpps   encode mpps   size kb    rate
qoi.lfifo:    1.9         7.8        243.15         59.79       460   28.0%
qoi.hfifo:    1.9         2.7        244.66        174.31       462   28.1%
qoi.master:   2.1         2.9        224.66        161.11       463   28.2%

Is your FIFO encoder slower? How much?

As you can see, encoding is a bit faster with the fifo (hash impl) compared to master, for reasons I don't understand.

wbd73 · 2021-12-19T16:52:58Z

Those numbers look good for the fifo approach and the source code only looks slightly more complicated.
An advantage of fifo is that each color is guaranteed to stay in the queue for 64 color changes.
It's more predictable. With the original code it can be much longer but in case of bad luck (hash collision) also very short.

I do agree that it is somewhat harder to explain how it works.

ikskuh · 2021-12-19T21:11:10Z

I think i'd still prefer the hash variant with 35711, as it's easier to implement in general and easier to explain.

For performance comparison, one would need to implement a nice vectorized lookup function for the fifo, as this is something that can nicely be vectorized, so it might not have much impact on encoding speed.

I personally would probably stay with the hash impl

pfusik · 2021-12-19T21:18:07Z

I think a "SIMD FIFO" encoder would be possible, with 64-entry std::map implemented with SIMD.

Your only cons are implementation complexity, but QOI remains straightforward compared to JPEG, PNG and even GIF.

phoboslab · 2021-12-19T22:14:38Z

After careful consideration, I have decided against this. It doesn't feel right. QOI emphasizes simplicity (and explaining how it works is part of that) so much everywhere else that it would be weird to sacrifice it here for slight gains in compression and throughput.

This makes the current specification in master final.

Thanks everyone!

phoboslab mentioned this issue Dec 19, 2021

The final(?) specification #48

Closed

phoboslab closed this as completed Dec 19, 2021

oscardssmith mentioned this issue Dec 22, 2021

Some simple patterns compress very poorly #122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Color index array as FIFO #94

Color index array as FIFO #94

phoboslab commented Dec 19, 2021

chocolate42 commented Dec 19, 2021

pfusik commented Dec 19, 2021

phoboslab commented Dec 19, 2021

wbd73 commented Dec 19, 2021 •

edited

Loading

ikskuh commented Dec 19, 2021

pfusik commented Dec 19, 2021

phoboslab commented Dec 19, 2021

Color index array as FIFO #94

Color index array as FIFO #94

Comments

phoboslab commented Dec 19, 2021

chocolate42 commented Dec 19, 2021

pfusik commented Dec 19, 2021

phoboslab commented Dec 19, 2021

wbd73 commented Dec 19, 2021 • edited Loading

ikskuh commented Dec 19, 2021

pfusik commented Dec 19, 2021

phoboslab commented Dec 19, 2021

wbd73 commented Dec 19, 2021 •

edited

Loading