libzdb: rewrite index memory handling #127
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previous index memory management was basically one large array of 128 MB and each block of 8 bytes was a pointer to a linked-list of entries.
Each entry was the object entry with an extra link for the namespace because all namespace were sharing the same master-array.
To find the correct list index, the crc32 of the key were used and the first 24 bits was the array-id, this method was really fast but needed to maintains a unique full large shared buffer. When removing a namespace, deleting process could be very long to iterate over all keys (for all namespaces) even if namespace contains few keys.
Real memory usage was quite effcient actually, thanks Linux, only effective used pages was allocated, so the large 128 MB array was dynamically allocated by the kernel but on really large database, limit can quickly come and sharing the array was anyway not a good idea IMO.
Here is a new implementation, in test. This new implementation use a dedicated index map per namespace, the namespace pointer per entry is thus not needed anymore, it's more logic and less error prone, no sharing across namespace anymore.
When deleting a namespace, the complete isolated map can be free'd without interfering with other keys/namespace.
There is still a crc32 computing on the key and that value will point to an array entry aswell, but this array become multi-level and levels are allocated dynamically. Each level is composed of an array of 16 pointers. This version use 5 level (20 bits out of 32). This is configurable.
Test were made on 2 millions keys namespace, memory usage can be reduced up to 6 times than before and is more predictible (eg: 2 times the same namespace will use 2 times the memory). More test to come, performance doesn't seems to be affected a lot, but it's slower (lookup and allocation are slower, but difference is not really noticable).