Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How about mmapping these bad boys? #2

Open
rob-p opened this issue Feb 18, 2019 · 6 comments
Open

How about mmapping these bad boys? #2

rob-p opened this issue Feb 18, 2019 · 6 comments

Comments

@rob-p
Copy link

rob-p commented Feb 18, 2019

Hi again, @gmarcais!

Another random question / feature request. Imagine that I want to use a compact_vector to store a very large array of encoded integers (e.g. a large suffix array or such). Now, I'm going to compute this array at great cost once, and then use it many times. If the vector is sufficiently large, one spends a lot of time deserializing it into RAM. However, since the layout is so nice, it might make sense to just mmap it so that we can start using it immediately. What do you think it the best way to do this with compact_vector?

@gmarcais
Copy link
Owner

My first thought is: can an Allocator class be used that is backed up by a mmap file? All the constructor do (or should) take an allocator object as their last argument.

Is it enough? Is there a need for explicit support in compact_vector?

@rob-p
Copy link
Author

rob-p commented Feb 18, 2019

The only issue I envision is that one may want to use an mmap allocator sometimes but not always (e.g. contingent on an input argument). I think the allocator approach could work as long as compact_vector is polymorphic allocator aware (https://en.cppreference.com/w/cpp/memory/polymorphic_allocator). Basically, one would want the allocator type to not modify the overall type of the compact_vector.

@gmarcais
Copy link
Owner

The allocator is a template parameter. Isn't it sufficient to do something like:

template<typename IDX, unsigned BITS = 0, typename W = uint64_t>
using ts_vector = compact_vector::ts_vector<IDX, BITS, W, std::pmr::polymorphic_allocator<W>>;

Now you can use ts_vector with polymorphic allocators to your heart delight, choosing at runtime which allocator to use?

I'll admit I have not used polymorphic allocators yet. I would be curious how easy or difficult to write such an allocator.

@Bouncner
Copy link
Contributor

Having a way to serialize the data structures (just get a few pointers with offsets a few widths) and a constructor that takes the same data structures would be already of great value for our research database.

By skipping over the code, I haven't seen anything that would rule such methods out.

@gmarcais
Copy link
Owner

There are really two classes. A compact iterator that does all the actual work. All it cares about is having a based pointer and the length of elements in bits. It really does not care where the memory comes from (allocated by the vector class or directly with malloc or mmap). The vector class does very little: it allocates memory using the allocator template parameter, and otherwise delegates everything to the iterator.

There are a few non-standard calls: get() returns the based point of the vector, bytes() returns the number of bytes in the vector (up to capacity, not just length), bits() the number of bits per elements. So everything is there to serialize the data structure.

The vector class can't be recreated directly from these elements unfortunately, although it could be done with an allocator. And probably it is best if done within an allocator as otherwise methods that resize the vector may use the wrong type of allocator to manage the pointer.

@Bouncner
Copy link
Contributor

Argh, yes. We are are using the vectors read-only. Never thought about growing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants