-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The final(?) specification #48
Comments
Things where i see potential to improvement:
Apart from that, i'm already super-happy with the format, as it is very cpu-friendly in terms of branching (my impl has 0.9% branch misses which surprised me) and it looks like people already made a QOI video stream to the oculus quest 1 with 50 FPS, so it seems to be usable for such heavy-load use cases as well And i agree that the format doesn't need to get any new features. It's really good in what it has already |
IMO, QOI version 1 is a really good first version. I think it makes sense to wait for a month or so before committing to a second version, since we want to make sure there is time to find good ideas. nigeltao/qoi2-bikeshed#14 has some really good analysis of opcode frequency which will be very useful for optimizing opcodes. While I understand why you don't want different 3 vs 4 channel behavior, I think we should look carefully at it. Separating them can give a pretty easy 10% increase in compression ratio because you simply have shorter opcodes, so you can fit more data in. This comes at a minor complexity cost, but I think it is somewhat offset by the fact that it simplifies the |
I think it's really nice to be able to differentiate between linear encoded channels vs sRGB encoded RGB channels + linear alpha channel (as this is the most common case for "images in the wild"). However, I'd like to point out that having the default I'm not sure what's the best way to handle this. IMO, This would lead to |
In terms of a better hash function, I think the best approach would be to just interpret the RGBA value as a 32 bit uint, and perform an xor and multiply by 2 32 bit prime numbers. This should give much better randomization at very low cost. While we're improving the index, it also might make sense to try adding a 16 bit instruction that is a combination of index with delta (QOI_INDEX_DELTA?). You could use 4 bits for the tag, 6 bits to store the index, and have 6 bits left to store the lower 2 bits of RGB values. This could be encoded and decoded relatively easily by having a second table where values are hashed by their upper 6 bits, which would allow O(1) lookup at the cost of a little memory. I think this instruction would be really valuable for things like photographs and dithered images where there is a little noise that will lower the number of exact matches. Currently for these types of images, we use a 32 bit QOI_COLOR tag roughly 14% of the time, and QOI_DIFF_24 roughly 12% of the time. For images without alpha, this tag would reduce those sizes by 50% and 33% when used, which would be a really big advantage. |
This sounds very complex to me and even if it might improve the compression rate, the compression speed will heavily suffer from it, as you now have to search the index array
True, i think we could go with something easier though, as multiplication is still costly on lower-end hardware. Maybe an adopepted Pearson hashing for 6 bit with a LUT of 64 byte, but the LUT makes my performance drop sense tingle. I think doing some xor-shifting while not discarding some bits might already be sufficient |
No you don't. In compression the code is roughly
The key here is that you store a second index where the values are hashed based on their upper 6 bits rather than the whole value. This keeps the lookup at O(1) at the expense of some extra memory. |
Ok, I guess we still have some things to iron out. Let's set the deadline for the final spec to 2021.12.20. Thanks for your patience <3 |
Was there a technical reason the spec chose to adopt big endian over little? |
I think separate channels is likely a really bad idea. If you have separate channels, it becomes impossible to compress while keeping byte alignment which is really bad for performance. Delata coding also does increase the size by 1 bit since you go from |
No. Big endian seemed like the right thing to do for a file format. I guess I'll have to revisit #36 - but there's tradeoffs either way.
All interesting ideas, but the result would be a different file format. Outside of the scope of what I'm willing to change here.
How did you arrive at these numbers? I did a quick an dirty test and got a ~3% smaller file size for the kodak images. |
for QOI_INDEX_DELTA, I just meant that when it's applicable, it replaces a 3 or 4 byte value with a 2 byte value, not that that would be the effect over the whole image. For the test, did you use the current |
It does expand the file though. See my above comment. |
No... thats not how it works. In theory, yes, but in practice no. I guess it wraps around. You just consider that from 255 to 0 you add 1. Or from 0 to 255 you subtract 1. So its like a loop of 0-255 values. Imagine them wrapped around a tube. |
Since the lower 2bits are omitted for the trunc_index I did: |
Can you try a version with |
How? normally this: RGBA,RGBA 8 bytes in both approaches. Where is the loss of byte alignment? |
This means you have to sweep the output memory four times instead of one time, hurting your cache locality a lot. One of the reasons why QOI is so fast is that it will touch every memory cell in both input and output exactly once. If we change this, we will get huge performance losses which aren't recoverable by any "win" in the compression rate |
Decompress rate is more important usually. Compress rate should still be fast enough. An operation as "channel splitting" could work at near memcpy speeds.
A good compiler schedules RGBA writes, so they all happen in the same cycle. Also would unroll the loop. You won't lose much more speed than an memcpy would have lost you. |
Letting it mature sounds like a good idea I guess. Though on the other hand, even if you had 3 different revisions of the format it would still be relatively simple. ;) Maybe the best way to go about it is to encourage people to use it as a simple intermediate format for now and NOT a long term storage format? Undoubtedly there will be subtle issues that will change your mind about how qoi should work, or even what it is. You've obviously found a nice local maxima in the design space, give it a little time to climb all the way to the top. :) I don't personally care about compatibility because I'm just using it as an intermediate format. Qoi's defining trait that made me say "why not?" was it's tiny, hackable implementation. It was one liner make rule to drop it into my asset pipeline, and a nearly 1:1 replacement for stb_image with some (mild) benefits. Why not indeed. Being easily hackable, I even pushed the alpha pre-multiplication into the encoder and don't have to care whether or not it's considered "standard" or not. (Though I am curious if there's a technical reason for that beyond "keep it simple", which is a good reason) |
As a humble suggestion to better define the scope of this format, perhaps it could be restated as: "a simple and fast run-length encoding of byte groups" Thus, if a group is actually a RGB triplet, then it is outside of the format scope to specify the colorspace (that's semantic info which should be somewhere else). Likewise, if the groups are RGBA quads, the actual alpha format is up to the application using it. Following my naïve interpretation, there are only three info fields needed in the header besides the magic: width, height and channels. I believe these are useful as a convenience (when holding images, you got the resolution and channels) but actually required when pre-allocating the buffer where decoding will happen. And the number of channels is needed as the "byte group" size for hashing/encoding/decoding. Therefore, covering the 1 and 2-channel cases could be both very simple to support and useful, as the format would be able to hold grayscale images, indexed pixelart (actual palette info is outside the scope), tilemap info and 2-channel textures (common in certain Physically-Based Rendering pipelines). Just my two cents. |
In the experimental branch I have removed
Encoded sizes (avg) kb
Screenshots and misc suffer a bit from the removal of I have also aligned the chunk prefixes so that they are either 2-bit or 4-bit. No more 3-bit codes. This could probably lead to some performance improvement for the decoder.
Note that I re-aranged the if statements in the encoding function to make it easier to experiment, but it makes encoding a bit slower. I think that's an easy win!? The new operation is very easy to implement with just two more subtractions. A rather unsuccessful experiment was to encode the "acceleration" of change for each channel instead of the "velocity" (diff) of change. It helps, but not much. We also really need a better test suite of images. As noted elsewhere, the only images with an alpha channel are the few ones in Anyway, I probably won't have time to work much on this in the next few days, but I'm hyped to get back to it :) |
If I made a PR to experimental to update the QOI_RUN_8 to an implementation more like #41 where repeated RUN_8 instructions are used more efficiently, would you be willing to merge it? Doing so should be a notable speedup for decompression, and give better file size. |
If we're throwing experiments out there, see also nigeltao/qoi2-bikeshed#20 |
It sounds attractive to have a simple-to-understand and fast-to-decode format that allows for more complexity (and innovation) on the encoder side. If a clever encoder wants to spend a lot of wall-clock time to get great compression, that doesn't harm the simplicity of the standard format. |
Hi! I like the idea of this format. I would like to see an optional section for metadata as part of the format. In general terms, it would be great to store additional information about the image into format. I consider the metadata section as optional in specification. The "general" metadata properties could be the following: date and time of image creation, location, author, general description, etc. In addition to the "general" metadata, it should be possible to add any other metadata in the "key=value" format. |
If you need real-world alpha textures for the benchmark suite, I have a whole game's worth of high-res painted alpha sprites you could pick through and freely use. The slowness of the PNG format was a huge impediment in our asset pipeline, so having a fast format that performs reasonably well on sprites with transparency is something I'm quite personally invested in. Let me know and I can send a big zip of the relevant images. |
I think that abandoning 3-bit indexes can be a bad move, as it reduces deltas size in DIFF_16, which is one of the most used opcodes at the moment. (Maybe GDIFF_16 can work even better with 544 DIFF_16?). Another way might be to remove DIFF_16 at all and enlarge the GDIFF_16 deltas to 464, 473 or 373 this way. |
@phoboslab
performs almost as good when I try.
but also when working with a lookup table it is faster since you only have to lookup two values instead of four. |
At the moment the implementation mentions
Since the format will not change when finalized, would you consider to change this requirement into
I can't see a problem with checking the other 2-bit tags before the 8-bit ones. |
I don't understand. The only tag that can collide with |
Copy/pasting from nigeltao/qoi2-bikeshed#28 "Demo 20: a Proof of Concept" Starting from commit aefa0f7 (2021-12-17) from the phoboslab/master branch,
The
Edit: added |
That's brilliant! I see absolutely no reason not to adopt the fifo index queue now. However, I would strongly advocate for using Here is a summary comparison of the different proposals for handling alpha:
|
Probably just a misunderstanding. I was under the impression any decoder had to follow the direction mentioned in the original source code to first check for all 8 bit tags. If it's okay to deviate from that it's fine. |
I have updated my previous post to do that (called |
I'm sorry, but I don't like any of these tradeoffs. I believe QOI is at a very sweet spot right now. The compression gains shown don't justify the added complexity. Pending any last minute emergencies, the current spec will be the final one. |
Nigeltao's FIFO reduces decode complexity, increases compression, and barely increases encode complexity. |
@phoboslab What complexity? QOI_OP_A, QOI_OP_DIFF2, and QOI_OP_DIFF4 are all extremely simple and straightforward ops, and the implementation-cached FIFO index actually removes complexity from the spec. On top of that, they increase the compression rate of the type of images qoi struggles the most with, those with complex alpha, by almost 20% ( |
That was also the one I hoped would make it. |
Why not to move the deadline then? 😀 Imho it's totally ok. Of course it's up to @phoboslab |
FIFO doubles the compression time and puts more burden on any implementation trying to get that time back. I like the fact that even a "naive" implementation of QOI is still a good one. I will not have a discussion about more OPs, sorry. Feels like we're back at last month, where I piss of a bunch of people with very cool ideas. But this time I have a bit more confidence to say: I'm happy with the current state of QOI and all the things we learned should be rolled into a new format. |
The FIFO we are discussing does not double the compression time, you are misunderstanding what the recent development as of demo20 does. |
The "naive" implementation did double compression time. But I have to admit, not having to specify any hash function is enticing... I will do some testing! |
Here's my simple proposal refined with demo20's FIFO:
By reworking existing ops slightly and adding a single op (a 3 byte RGB encoding that often means we avoid a 4 byte QOI_OP_RGB encoding), compression is significantly improved, decoding is slightly quicker, and encode is slightly slower.
For more details: #91 |
An exhaustive search of the FIFO does slow down compression, but you don't need to do an exhaustive search. You can gain all that speed back by using a hash, much like what the current version does. But with the FIFO, that extra bit of complexity is optional instead of mandatory, and the decoder never sees it. I would agree that the FIFO index queue removes complexity overall. As for the new opcodes, while they do add complexity, they are all pretty straightforward and don't change anything about how the format already works. And if adding all three seems like too much, there are other options. While personally I'd love to have them all, diff4 + diff2, only diff4, or even only op_a (though I do think diff4 is better if you're looking for just one) are all reasonable and significantly better than having nothing at all for varying alpha. The inability to compress the alpha channel has been a major blind spot of the format imho, and it's a shame to not address that in any way when there are plenty of good options available. Speaking personally, the ability of the format to handle varying alpha is one of the biggest factors in my ability to make use of it. It's disappointing to hear that you aren't willing to consider changes and would suggest forking the format instead. I don't want to deal with a fractured format, so that doesn't sound like an appealing option to me. However you slice it, the current proposal is simpler than the first version of the format that you presented a month ago, which you rightly considered very simple at the time. If the format was simple enough back then, there's no question that it's simple enough with these changes too. |
More FIFO discussion here: #94 @notnullnotvoid yeah, 20% improvement sounds nice, but does it really matter in practice if your images are highly compressed to 13.6% or to 11.5%? If it does, you should consider a different image format. Also, I'm not proposing to "fork" QOI. I'm trying hard to establish only true standard for QOI here. An improved format should have a different name (QGI or QOI2, whatever) and should implement all the things that QOI does not: allowing restarts for parallel processing, block based encoding, maybe a color transform to YUV and then some more OPs. |
Yes, it does. If such a simple change led to such huge gains only for opaque images, you'd adopt it without hesitation. Why you have so much disdain for people working with transparent images, I genuinely do not understand. I am reaching out to you in the first place from a fairly generous position of compromise. I was careful to make sure that the fix I proposed was extremely simple, did not change any of what already works well in the format, and could be implemented without noticeably impacting ratio or throughput for opaque images. The format privileges opaque images over transparent ones to a pretty extreme degree, and to be honest, after these changes it still does - just less so, enough to make it more bearable. The fact that it's even possible to get such huge wins by devoting less than 1% of the opcode space to one or two multi-byte encodings is evidence enough of the severity of the problem. All I am asking you to do is make a small addition to alleviate the most glaring weak point in the format, while you still have the opportunity to do so. Seriously, I am trying to help you out here. Please let me. |
I'm not sure if I have to say this, but I have nothing against images with an alpha channel or people using such images. I'm just considering the tradeoffs here and for QOI I'm very heavily leaning towards simplicity. We already have too many complicated image formats out there. There were countless other proposals that I rejected on those grounds. So please don't take it personal. My suggestion to look into another image format was an honest one. If you have a lot of semi-transparent images, QOI may not be the best choice. As you said yourself, even with those proposed fixes QOI still performs badly when dealing with complicated alpha channels. |
@phoboslab I believe that what made this format gain so much traction in the first place is its simplicity and speed because people do look for such things. there would also always be room for improvement. as I see it, there are 3 choices
|
@DanielMagen as you can imagine, I agree that QOI got the attention because of its simplicity. I mean, FLIF is awesome in terms of compression, but as far as I can tell there's exactly one implementation of it. In regards to 2.: I guess the proposal of just using the color hash as a FIFO (#94) would kindle your romance? |
naivly, it looks to me like you're trying to squeeze the last bits from the cache table and sacrificing simplicity a lot for it. but I'll let other people choose if it's a valid direction. |
Sorry, I'm not sure how to read your comment. You mean the FIFO sacrifices simplicity?
That'll be me, I guess. And I've been staring it this whole thing for too long... |
sorry I'll make myself more clear. the "highest hit to simplicity" I see is that to give a good (fast) implementation of qoi with the fifo approach, one would have to use the hash trick you used. so even if the spec is simplified, in practice, this would add more code to any implementation with negligible effects to compression. this would also add more burden to someone trying to implement qoi in the form of needing to read/come-up(if the hash trick is not specified) with the hash/fifo idea and choose a good hash. |
Just to be clear, the spec is now locked/fixed/unchanging(atleast anything major) right? I wanted to pitch an implementation to be merged in production software. Just wanted to know where exactly things stand. :) oof dammit, saw the other issue about 30 seconds later. Yeah anyone else who finds themselves here, atleast at the moment things seem fixed. So ciao. |
I want to apologize.
I may have been too quick with announcing the file format to be finished. I'm frankly overwhelmed with the attention this is getting. With all the implementations already out there, I thought it was a good idea to finalize the specification ASAP. I'm no longer sure if that was the right decision.
QOI is probably good enough the way it is now, but I'm wondering if there are things that could be done better — without sacrificing the simplicity or performance of this format.
One of these things is the fact that
QOI_RUN_16
was determined to be pretty useless, and QOI could be become even simpler by just removing it. Maybe there's more easy wins with a different hash function or distributing some bits differently? I don't know.At the risk of annoying everyone: how do you all feel about giving QOI a bit more time to mature?
To be clear, the things I'd be willing to discuss here are fairly limited:
What I'm looking for specifically is:
Should we set a deadline in 2-3 weeks to produce the really-final (pinky promise) specification? Or should we just leave it as it is?
Again, I'm very sorry for the confusing messaging!
Edit: Thanks for your feedback. Let's produce the final spec till 2021.12.20.
The text was updated successfully, but these errors were encountered: