-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: use versioning for edge repair in HNSW #35
base: main
Are you sure you want to change the base?
Conversation
- **Tracks Node Reuse**: Each node has a version counter incremented every time the node is reused (after being deleted). | ||
- **Detects Stale Edges**: Adjacency lists store both an identifier and the version of the target node at link creation time. During search, edges whose stored version doesn’t match the node’s current version are treated as invalid (stale). | ||
- **Supports Lazy Repair**: An asynchronous background process progressively removes or updates invalid edges, preventing them from repeatedly impacting query results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what you aim to achieve with node versioning?
Just a reminder: ValkeySearch uses a time-slice mechanism with mutual exclusion between reads and writes to ensure consistency. This guarantees that a mutation is either applied to all indexes or to none, ensuring atomicity. This consistency model applies to all indexes, not just vector indexes.
|
||
Hierarchical Navigable Small World (HNSW) graphs are widely used for Approximate Nearest Neighbor (ANN) search due to their efficient graph-based structure. However, in production systems that support dynamic updates (e.g., new data points arriving or existing ones being removed), deleted nodes are sometimes reused (via `allow_replace_deleted_` in hnswlib). This reuse introduces two major problems: | ||
|
||
- **Stale Inbound Links**: When a node is deleted and subsequently reused, any existing inbound links from other nodes’ adjacency lists point to a different “state.” These stale references mislead search queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today reuse is not enabled.
Hierarchical Navigable Small World (HNSW) graphs are widely used for Approximate Nearest Neighbor (ANN) search due to their efficient graph-based structure. However, in production systems that support dynamic updates (e.g., new data points arriving or existing ones being removed), deleted nodes are sometimes reused (via `allow_replace_deleted_` in hnswlib). This reuse introduces two major problems: | ||
|
||
- **Stale Inbound Links**: When a node is deleted and subsequently reused, any existing inbound links from other nodes’ adjacency lists point to a different “state.” These stale references mislead search queries. | ||
- **Recall Degradation**: Searches that traverse stale edges can produce suboptimal or incorrect nearest-neighbor results—an issue with potentially severe impact on high-recall vector search applications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm uncertain about this given the current implementation, which doesn't reuse deleted nodes. On the other hand, since HNSWlib handles deletion by marking nodes as removed without reclaiming them, a large number of deleted nodes could lead to increased query runtimes.
Each encounter with a stale edge enqueues a repair task (e.g., `(owner_node_id, edge_index)` or `(neighbor_id, stored_version)`). | ||
|
||
- **Batch Repairs**: | ||
A background process periodically locks node adjacency lists, removes or fixes stale entries, and moves on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should provide more details on how the node removal algorithm would work. Specifically:
- Handling Removal of an Entry Point at a Higher Level: How would you accommodate the scenario where the removed node is an entry point at a higher layer?
- Finding Incoming Links: How would you identify all nodes that have links pointing to the removed node, given that HNSW does not maintain reverse links?
- Fixing Stale Entries: Could you clarify what you mean by "stale" entries? Additionally, how would you handle and clean up these entries to maintain graph integrity and avoid performance degradation?
Adding an RFC for improving hnsw library to allow better handling of deleted nodes in the graph.