-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for sharing memory between the module and the engine #1804
base: unstable
Are you sure you want to change the base?
Conversation
Planning to add unit tests soon. Publishing early to start the discussion rolling. |
Sharing memory between the module and engine reduces memory overhead by eliminating redundant copies of stored entries in the module. This is particularly beneficial for search workloads that require indexing large volumes of stored data. Shared SDS, a new data type, facilitates module-engine memory sharing with thread-safe intrusive reference counting. It preserves SDS semantics and structure while adding ref-counting and a free callback for statistics tracking. New module APIs: - VM_CreateSharedSDS: Creates a new Shared SDS. - VM_SharedSDSPtrLen: Retrieves the raw buffer pointer and length of a Shared SDS. - VM_ReleaseSharedSDS: Decreases the Shared SDS ref-count by 1. Extended module APIs: - VM_HashSet: Now supports setting a shared SDS in the hash. - VM_HashGet: Retrieves a shared SDS and increments its ref-count by 1.
104a4dd
to
f9aad1a
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #1804 +/- ##
============================================
- Coverage 70.99% 70.93% -0.06%
============================================
Files 123 123
Lines 65651 65749 +98
============================================
+ Hits 46609 46642 +33
- Misses 19042 19107 +65
🚀 New features to boost your workflow:
|
Consensus is that we don't want to rush the implementation and commit to a specific API, after we are technically past the new API cutoff for the release. If we can converge offline about the design quickly, we'll merge it, otherwise we'll wait until 9.0. |
@zuiderkwast , @ranshid , happy discuss further with you! It would be valuable for ValkeySearch 1.0 to have such interface in 8.1, after all there is no second chance to make a first time good impression ;). |
dd9a9d2
to
d601ba1
Compare
Signed-off-by: yairgott <[email protected]>
d601ba1
to
47a9487
Compare
Overview
Sharing memory between the module and engine reduces memory overhead by eliminating redundant copies of stored records in the module. This is particularly beneficial for search workloads that require indexing large volumes of documents.
Vectors
Vector similarity search requires storing large volumes of high-cardinality vectors. For example, a single vector with 512 dimensions consumes 2048 bytes, and typical workloads often involve millions of vectors. Due to the lack of a memory-sharing mechanism between the module and the engine, ValkeySearch currently doubles memory consumption when indexing vectors, significantly increasing operational costs. This limitation introduces adoption friction and reduces ValkeySearch's competitiveness.
Implementation Details
Memory Allocation Strategy
At a fundamental level, there are two primary allocation strategies:
For ValkeySearch, it is crucial that vectors reside in cache-aligned memory to maximize SIMD optimizations. Allowing the module to allocate memory provides greater flexibility for different use cases, though it introduces slightly higher implementation complexity.
Shared SDS
Shared SDS, a new data type, facilitates module-engine memory sharing with thread-safe intrusive reference counting. It preserves SDS semantics and structure while adding ref-counting and a free callback for statistics tracking.
A core component that enables thread-safe buffer sharing could be beneficial for use cases beyond modules. One notable advantage is avoiding deep copies of buffers when IO threading is enabled.
Module API
New Module APIs
VM_CreateSharedSDS
:VM_SharedSDSPtrLen
: Retrieves the raw buffer pointer and length of a Shared SDS.VM_ReleaseSharedSDS
: Decreases the Shared SDS ref-count by 1.Extended Module APIs
VM_HashSet
: Supports setting a shared SDS in the hash.VM_HashGet
: Retrieves a shared SDS from the hash and increments its ref-count by 1.Engine Hash Data-Type
ValkeySearch indexes documents which reside in engine as
t_hash
data-type records. While JSON is also supported, it is out of scope for this discussion. Thet_hash
implementation is based on either list-pack for small datasets or hashtable for larger ones.Since list-pack performs deep copies, it cannot support intrusive ref-counting semantics. As a result, if list-pack is used as the underline data-type while setting a shared SDS, .e.g. by calling
VM_HashSet
, it is converted tohashtable
. Additionally, for the same reason, a shared SDS is never stored asembedded
value in a hashtable entry.