Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an API #1

Open
nemequ opened this issue Jun 1, 2016 · 10 comments
Open

Provide an API #1

nemequ opened this issue Jun 1, 2016 · 10 comments

Comments

@nemequ
Copy link
Contributor

nemequ commented Jun 1, 2016

I was just thinking about throwing together a squash plugin; it would be a lot easier if you provided a public API.

@akamiru
Copy link
Owner

akamiru commented Jun 1, 2016

Thanks for your request! I think I will find the time to implement one tomorrow. Do you have any specific requests on the exact implementation?

@nemequ
Copy link
Contributor Author

nemequ commented Jun 1, 2016

https://github.com/quixdb/squash/blob/master/docs/internals.md explains some of the basic patterns pretty much every compression library follows. It's geared more towards helping people understand how Squash plugins work, but it should give you a good idea of what your API should look like… in a perfect world, the Squash plugin would just be a very thin wrapper to call the underlying library. Unfortunately, the easiest one to implement (all-in-one) is also the least flexible, and the most flexible (zlib-style stream) is the most difficult to implement…

As for details, I would suggest:

And some stuff which you may not be able to do because C++ makes it difficult:

  • Use a workmem argument (take a look at LZO's API for an example) so people can reuse memory across multiple operations instead of allocating/freeing inside your code.
  • If you can't do that, give people a way to provide their own malloc/free implementations.

@akamiru
Copy link
Owner

akamiru commented Jun 1, 2016

This seems to boil down to a very low-level implementation of BCE that I'd like to avoid and stay pretty c++14 but I will try to get something around that working (at least a c way to interface a basic API).

I already use the uintX_t types in the code which includes uint8_t for char data.

I use std::vector for almost all memory requests. I think passing through the Allocator patterns is the correct way to pass on the allocation scheme in c++ so I'll make this available. While there is a upper limit O(n) for the memory used I haven't calculated it exactly for this release yet (it's a bit more complex than with BCE v0.2/0.3) so using fixed blocks of memory is hard anyway. Moreover it would waste a lot of memory in most cases.

@akamiru
Copy link
Owner

akamiru commented Jun 21, 2016

The clean-up branch now has a header file based library (bce.h + builtins.h).
See main.cpp for usage.

It allows for memory to memory compression and exposes all allocators (I hope I didn't miss one).

@nemequ
Copy link
Contributor Author

nemequ commented Jun 21, 2016

FWIW, it's probably going to be a while before I get around to writing a Squash plugin for this. It's going to be a bit of a pain since the API doesn't really match up with what we need, and I don't like C++, so I'll probably procrastinate :/. AFAICT there won't be a way to do it without some memcpy()s, so performance probably won't be great…

@akamiru
Copy link
Owner

akamiru commented Jun 22, 2016

BCE already copies and mallocs a lot internal (e.g. queue:: packed and queue::vector - strangely freeing and reallocate actually makes bce faster on my machine). Moreover I have to copy the streams of the coders behind each other at the end. Sadly I have no idea how to interleave/append them without additional synchronization and the parallel openmp part speeds up the bce stage by a factor of 3.

What I'm trying to say is BCE runs at around 5 MB/s on my 4770K so I don't think some copies will hurt it that much.

Anyway I will rewrite some of the coder part to immediately return the coded chunk this should result in a simpler API without loss of compression.

@nemequ
Copy link
Contributor Author

nemequ commented Jun 22, 2016

Looking at main.cpp, won't it fail if the input size is > 2^30? It reads the input in until it has reached the block size, then finalizes it… it seems like you're missing an outer loop to read multiple blocks, or am I missing something?

@akamiru
Copy link
Owner

akamiru commented Jun 23, 2016

In short: the compress call feeds the bce's coders and finalize ties the data together and returns them. This will change soon and compress() will return instantly.

The problem I currently have is the following: BCE uses 8 independent arithmetric coders with an adaptive model. This allows it to run on 8 threads in parallel with a bit of synchronization (speed up is around 3). But sadly training these modells is quiet costly (1-2% I'd guess) therefore the current version reuses the coders for every block (read call to compress) and outputs them copied behind each other in finalize(). This is really suboptimal and I'll extend the coders to be resetable so that I can output a chunk directly with with compress-call. Moreover I'm trying to find out how to pretrain the modells to reduce the impact and maybe isolate the blocks completly.

@akamiru
Copy link
Owner

akamiru commented Jun 29, 2016

The clean-up branch now contains a different api that you should like better. Moreover it completely builds with visual studio 2015, gcc 6.1 and clang 3.8.

@nemequ
Copy link
Contributor Author

nemequ commented Jul 11, 2016

Just FYI, I'm blocking on y-256/libdivsufsort#15 right now. Once that's merged, I have a patch for bce ready to go, then I'll try to throw together a plugin for Squash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants