-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash function #19
Comments
For what it's worth, this trivial change to the hash function: diff --git a/qoi.h b/qoi.h
index 0aec728..da278a8 100644
--- a/qoi.h
+++ b/qoi.h
@@ -321,7 +321,7 @@ void *qoi_decode(const void *data, int size, qoi_desc *desc, int channels);
#define QOI_MASK_3 0xe0 // 11100000
#define QOI_MASK_4 0xf0 // 11110000
-#define QOI_COLOR_HASH(C) (C.rgba.r ^ C.rgba.g ^ C.rgba.b ^ C.rgba.a)
+#define QOI_COLOR_HASH(C) (C.rgba.r ^ C.rgba.g ^ C.rgba.b)
#define QOI_MAGIC \
(((unsigned int)'q') << 24 | ((unsigned int)'o') << 16 | \
((unsigned int)'i') << 8 | ((unsigned int)'f')) Has a noticable improvement in encode throughput on my laptop, for images/misc (1.18x) and images/screenshots (1.21x). The images/misc set (the only one with not-fully-opaque images) also shows slightly worse compression size (1.01x).
|
Here's another experiment, called "qoi-fifo2e". Instead of tweaking the hash function, we remove the hash function and just use a FIFO cache: keep the 64 most recently written (not necessarily most recently used/read) colors. Update the cache on DIFF, LUMA, RGB and RGBA ops. Leave the cache unchanged on INDEX and RUN ops. Here's the diff (the code is a little awkward because I was trying to minimize the diff): --- qoi-master-2ee2169.h 2021-12-12 08:44:59.203920322 +1100
+++ qoi-fifo2e.h 2021-12-15 13:29:18.194335467 +1100
@@ -387,6 +387,7 @@
const unsigned char *pixels = (const unsigned char *)data;
qoi_rgba_t index[64] = {0};
+ unsigned int index_wpos = 0;
int run = 0;
qoi_rgba_t px_prev = {.rgba = {.r = 0, .g = 0, .b = 0, .a = 0}};
@@ -419,13 +420,15 @@
run = 0;
}
- int index_pos = QOI_COLOR_HASH(px) % 64;
-
+ for (int index_pos = 0;; index_pos++) {
+ if (index_pos < 64) {
if (index[index_pos].v == px.v) {
bytes[p++] = QOI_OP_INDEX | index_pos;
+ break;
+ }
}
else {
- index[index_pos] = px;
+ index[index_wpos++ & 63] = px;
if (px.rgba.a == px_prev.rgba.a) {
char vr = px.rgba.r - px_prev.rgba.r;
@@ -464,7 +467,10 @@
bytes[p++] = px.rgba.b;
bytes[p++] = px.rgba.a;
}
+ break;
}
+ } // end-for (int index_pos = 0;; index_pos++)
+
}
px_prev = px;
}
@@ -516,6 +522,7 @@
qoi_rgba_t px = {.rgba = {.r = 0, .g = 0, .b = 0, .a = 0}};
qoi_rgba_t index[64] = {0};
+ unsigned int index_wpos = 0;
int run = 0;
int chunks_len = size - QOI_PADDING;
@@ -530,12 +537,14 @@
px.rgba.r = bytes[p++];
px.rgba.g = bytes[p++];
px.rgba.b = bytes[p++];
+ index[index_wpos++ & 63] = px;
}
else if (b1 == QOI_OP_RGBA) {
px.rgba.r = bytes[p++];
px.rgba.g = bytes[p++];
px.rgba.b = bytes[p++];
px.rgba.a = bytes[p++];
+ index[index_wpos++ & 63] = px;
}
else if ((b1 & QOI_MASK_2) == QOI_OP_INDEX) {
px = index[b1];
@@ -544,6 +553,7 @@
px.rgba.r += ((b1 >> 4) & 0x03) - 2;
px.rgba.g += ((b1 >> 2) & 0x03) - 2;
px.rgba.b += ( b1 & 0x03) - 2;
+ index[index_wpos++ & 63] = px;
}
else if ((b1 & QOI_MASK_2) == QOI_OP_LUMA) {
int b2 = bytes[p++];
@@ -551,12 +561,11 @@
px.rgba.r += vg - 8 + ((b2 >> 4) & 0x0f);
px.rgba.g += vg;
px.rgba.b += vg - 8 + (b2 & 0x0f);
+ index[index_wpos++ & 63] = px;
}
else if ((b1 & QOI_MASK_2) == QOI_OP_RUN) {
run = (b1 & 0x3f);
}
-
- index[QOI_COLOR_HASH(px) % 64] = px;
}
if (channels == 4) { Here's the numbers. Encode speed gets 2.4x worse because we now have to compare against all 64 cache entries (instead of just 1) on each non-run pixel. Decode speed gets 1.2x better because
|
Refactored a little and tried making an AVX2 path with intrinsics. @notnullnotvoid this may not be optimal as I have zero experience, don't let this existing deter you from rolling your own and/or benchmarking. This exists to learn and to apply it to a variant, I'm not benchmarking further. Compile with -mavx2 for avx2. Uses a gcc builtin to count trailing zeroes, msvc and maybe others will need their own definitions.
|
Can you add a row for qoi-master to this benchmark? |
Done. Didn't redo the other results like you're supposed to but the environmental conditions are the same as half an hour ago ;) |
Currently, it's:
This is about as simple as it gets, which has its own virtues. There may be better ones, although be aware that better compression (size) might mean worse performance (throughput).
Some discussion (amongst other things) is at phoboslab/qoi#48
The text was updated successfully, but these errors were encountered: