Provide an API #1

nemequ · 2016-06-01T03:31:30Z

I was just thinking about throwing together a squash plugin; it would be a lot easier if you provided a public API.

akamiru · 2016-06-01T14:07:55Z

Thanks for your request! I think I will find the time to implement one tomorrow. Do you have any specific requests on the exact implementation?

nemequ · 2016-06-01T17:13:17Z

https://github.com/quixdb/squash/blob/master/docs/internals.md explains some of the basic patterns pretty much every compression library follows. It's geared more towards helping people understand how Squash plugins work, but it should give you a good idea of what your API should look like… in a perfect world, the Squash plugin would just be a very thin wrapper to call the underlying library. Unfortunately, the easiest one to implement (all-in-one) is also the least flexible, and the most flexible (zlib-style stream) is the most difficult to implement…

As for details, I would suggest:

Use C linkage, not C++, for the public functions.
- Assuming you do this, use conformant array parameters. Note that C++ doesn't support it, nor does C before C99, so you'll want something like https://github.com/quixdb/squash/blob/master/squash/squash.h#L53
Use size_t for sizes. I also suggest uint8_t for buffers, but that's less important.
Provide a max_compressed_size function or macro, especially if you have an all-in-one API.

And some stuff which you may not be able to do because C++ makes it difficult:

Use a workmem argument (take a look at LZO's API for an example) so people can reuse memory across multiple operations instead of allocating/freeing inside your code.
If you can't do that, give people a way to provide their own malloc/free implementations.

akamiru · 2016-06-01T19:16:27Z

This seems to boil down to a very low-level implementation of BCE that I'd like to avoid and stay pretty c++14 but I will try to get something around that working (at least a c way to interface a basic API).

I already use the uintX_t types in the code which includes uint8_t for char data.

I use std::vector for almost all memory requests. I think passing through the Allocator patterns is the correct way to pass on the allocation scheme in c++ so I'll make this available. While there is a upper limit O(n) for the memory used I haven't calculated it exactly for this release yet (it's a bit more complex than with BCE v0.2/0.3) so using fixed blocks of memory is hard anyway. Moreover it would waste a lot of memory in most cases.

akamiru · 2016-06-21T18:56:59Z

The clean-up branch now has a header file based library (bce.h + builtins.h).
See main.cpp for usage.

It allows for memory to memory compression and exposes all allocators (I hope I didn't miss one).

nemequ · 2016-06-21T23:46:09Z

FWIW, it's probably going to be a while before I get around to writing a Squash plugin for this. It's going to be a bit of a pain since the API doesn't really match up with what we need, and I don't like C++, so I'll probably procrastinate :/. AFAICT there won't be a way to do it without some memcpy()s, so performance probably won't be great…

akamiru · 2016-06-22T07:08:06Z

BCE already copies and mallocs a lot internal (e.g. queue:: packed and queue::vector - strangely freeing and reallocate actually makes bce faster on my machine). Moreover I have to copy the streams of the coders behind each other at the end. Sadly I have no idea how to interleave/append them without additional synchronization and the parallel openmp part speeds up the bce stage by a factor of 3.

What I'm trying to say is BCE runs at around 5 MB/s on my 4770K so I don't think some copies will hurt it that much.

Anyway I will rewrite some of the coder part to immediately return the coded chunk this should result in a simpler API without loss of compression.

nemequ · 2016-06-22T16:20:10Z

Looking at main.cpp, won't it fail if the input size is > 2^30? It reads the input in until it has reached the block size, then finalizes it… it seems like you're missing an outer loop to read multiple blocks, or am I missing something?

akamiru · 2016-06-23T09:17:26Z

In short: the compress call feeds the bce's coders and finalize ties the data together and returns them. This will change soon and compress() will return instantly.

The problem I currently have is the following: BCE uses 8 independent arithmetric coders with an adaptive model. This allows it to run on 8 threads in parallel with a bit of synchronization (speed up is around 3). But sadly training these modells is quiet costly (1-2% I'd guess) therefore the current version reuses the coders for every block (read call to compress) and outputs them copied behind each other in finalize(). This is really suboptimal and I'll extend the coders to be resetable so that I can output a chunk directly with with compress-call. Moreover I'm trying to find out how to pretrain the modells to reduce the impact and maybe isolate the blocks completly.

akamiru · 2016-06-29T12:06:24Z

The clean-up branch now contains a different api that you should like better. Moreover it completely builds with visual studio 2015, gcc 6.1 and clang 3.8.

nemequ · 2016-07-11T23:53:20Z

Just FYI, I'm blocking on y-256/libdivsufsort#15 right now. Once that's merged, I have a patch for bce ready to go, then I'll try to throw together a plugin for Squash.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide an API #1

Provide an API #1

nemequ commented Jun 1, 2016

akamiru commented Jun 1, 2016

nemequ commented Jun 1, 2016

akamiru commented Jun 1, 2016

akamiru commented Jun 21, 2016

nemequ commented Jun 21, 2016

akamiru commented Jun 22, 2016

nemequ commented Jun 22, 2016

akamiru commented Jun 23, 2016

akamiru commented Jun 29, 2016

nemequ commented Jul 11, 2016

Provide an API #1

Provide an API #1

Comments

nemequ commented Jun 1, 2016

akamiru commented Jun 1, 2016

nemequ commented Jun 1, 2016

akamiru commented Jun 1, 2016

akamiru commented Jun 21, 2016

nemequ commented Jun 21, 2016

akamiru commented Jun 22, 2016

nemequ commented Jun 22, 2016

akamiru commented Jun 23, 2016

akamiru commented Jun 29, 2016

nemequ commented Jul 11, 2016