Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for sharing memory between the module and the engine #1804

Open
wants to merge 2 commits into
base: unstable
Choose a base branch
from

Conversation

yairgott
Copy link

@yairgott yairgott commented Mar 1, 2025

Overview

Sharing memory between the module and engine reduces memory overhead by eliminating redundant copies of stored records in the module. This is particularly beneficial for search workloads that require indexing large volumes of documents.

Vectors

Vector similarity search requires storing large volumes of high-cardinality vectors. For example, a single vector with 512 dimensions consumes 2048 bytes, and typical workloads often involve millions of vectors. Due to the lack of a memory-sharing mechanism between the module and the engine, ValkeySearch currently doubles memory consumption when indexing vectors, significantly increasing operational costs. This limitation introduces adoption friction and reduces ValkeySearch's competitiveness.

Implementation Details

Memory Allocation Strategy

At a fundamental level, there are two primary allocation strategies:

  • [Chosen] Module-allocated memory shared with the engine.
  • Engine-allocated memory shared with the module.

For ValkeySearch, it is crucial that vectors reside in cache-aligned memory to maximize SIMD optimizations. Allowing the module to allocate memory provides greater flexibility for different use cases, though it introduces slightly higher implementation complexity.

Shared SDS

Shared SDS, a new data type, facilitates module-engine memory sharing with thread-safe intrusive reference counting. It preserves SDS semantics and structure while adding ref-counting and a free callback for statistics tracking.

A core component that enables thread-safe buffer sharing could be beneficial for use cases beyond modules. One notable advantage is avoiding deep copies of buffers when IO threading is enabled.

Module API

New Module APIs

  • VM_CreateSharedSDS:
    • Creates a new Shared SDS.
    • Accepts an allocation function for fine-grained control (e.g., cache alignment).
    • Accepts a free callback function to track deallocations.
  • VM_SharedSDSPtrLen: Retrieves the raw buffer pointer and length of a Shared SDS.
  • VM_ReleaseSharedSDS: Decreases the Shared SDS ref-count by 1.

Extended Module APIs

  • VM_HashSet: Supports setting a shared SDS in the hash.
  • VM_HashGet: Retrieves a shared SDS from the hash and increments its ref-count by 1.

Engine Hash Data-Type

ValkeySearch indexes documents which reside in engine as t_hash data-type records. While JSON is also supported, it is out of scope for this discussion. The t_hash implementation is based on either list-pack for small datasets or hashtable for larger ones.

Since list-pack performs deep copies, it cannot support intrusive ref-counting semantics. As a result, if list-pack is used as the underline data-type while setting a shared SDS, .e.g. by calling VM_HashSet, it is converted to hashtable. Additionally, for the same reason, a shared SDS is never stored as embedded value in a hashtable entry.

@yairgott
Copy link
Author

yairgott commented Mar 1, 2025

Planning to add unit tests soon. Publishing early to start the discussion rolling.

Sharing memory between the module and engine reduces memory overhead by eliminating
redundant copies of stored entries in the module. This is particularly beneficial
for search workloads that require indexing large volumes of stored data.

Shared SDS, a new data type, facilitates module-engine memory sharing with thread-safe
intrusive reference counting. It preserves SDS semantics and structure while adding
ref-counting and a free callback for statistics tracking.

New module APIs:

- VM_CreateSharedSDS: Creates a new Shared SDS.
- VM_SharedSDSPtrLen: Retrieves the raw buffer pointer and length of a Shared SDS.
- VM_ReleaseSharedSDS: Decreases the Shared SDS ref-count by 1.

Extended module APIs:

- VM_HashSet: Now supports setting a shared SDS in the hash.
- VM_HashGet: Retrieves a shared SDS and increments its ref-count by 1.
@yairgott yairgott force-pushed the engine_module_shared_memory branch from 104a4dd to f9aad1a Compare March 1, 2025 09:24
Copy link

codecov bot commented Mar 1, 2025

Codecov Report

Attention: Patch coverage is 22.11538% with 81 lines in your changes missing coverage. Please review.

Project coverage is 70.93%. Comparing base (3f6581b) to head (47a9487).
Report is 8 commits behind head on unstable.

Files with missing lines Patch % Lines
src/module.c 6.52% 43 Missing ⚠️
src/sds.c 25.64% 29 Missing ⚠️
src/sds.h 0.00% 5 Missing ⚠️
src/t_hash.c 69.23% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1804      +/-   ##
============================================
- Coverage     70.99%   70.93%   -0.06%     
============================================
  Files           123      123              
  Lines         65651    65749      +98     
============================================
+ Hits          46609    46642      +33     
- Misses        19042    19107      +65     
Files with missing lines Coverage Δ
src/server.h 100.00% <ø> (ø)
src/t_zset.c 96.88% <100.00%> (+0.04%) ⬆️
src/t_hash.c 95.71% <69.23%> (-0.52%) ⬇️
src/sds.h 78.04% <0.00%> (-4.85%) ⬇️
src/sds.c 82.69% <25.64%> (-3.98%) ⬇️
src/module.c 9.60% <6.52%> (-0.01%) ⬇️

... and 19 files with indirect coverage changes

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@madolson
Copy link
Member

madolson commented Mar 3, 2025

Consensus is that we don't want to rush the implementation and commit to a specific API, after we are technically past the new API cutoff for the release. If we can converge offline about the design quickly, we'll merge it, otherwise we'll wait until 9.0.

@PingXie PingXie requested review from zuiderkwast and ranshid March 3, 2025 19:54
@yairgott yairgott closed this Mar 3, 2025
@yairgott
Copy link
Author

yairgott commented Mar 3, 2025

@zuiderkwast , @ranshid , happy discuss further with you!

It would be valuable for ValkeySearch 1.0 to have such interface in 8.1, after all there is no second chance to make a first time good impression ;).

@yairgott yairgott reopened this Mar 3, 2025
@yairgott yairgott force-pushed the engine_module_shared_memory branch 6 times, most recently from dd9a9d2 to d601ba1 Compare March 5, 2025 08:13
Signed-off-by: yairgott <[email protected]>
@yairgott yairgott force-pushed the engine_module_shared_memory branch from d601ba1 to 47a9487 Compare March 5, 2025 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants