RFC: use versioning for edge repair in HNSW #35

vitarb · 2025-02-04T22:51:56Z

Adding an RFC for improving hnsw library to allow better handling of deleted nodes in the graph.

yairgott · 2025-02-08T02:28:05Z

rfc/replace-deleted.md

+- **Tracks Node Reuse**: Each node has a version counter incremented every time the node is reused (after being deleted).
+- **Detects Stale Edges**: Adjacency lists store both an identifier and the version of the target node at link creation time. During search, edges whose stored version doesn’t match the node’s current version are treated as invalid (stale).
+- **Supports Lazy Repair**: An asynchronous background process progressively removes or updates invalid edges, preventing them from repeatedly impacting query results.


Can you clarify what you aim to achieve with node versioning?

Just a reminder: ValkeySearch uses a time-slice mechanism with mutual exclusion between reads and writes to ensure consistency. This guarantees that a mutation is either applied to all indexes or to none, ensuring atomicity. This consistency model applies to all indexes, not just vector indexes.

yairgott · 2025-02-11T02:21:14Z

rfc/replace-deleted.md

+
+Hierarchical Navigable Small World (HNSW) graphs are widely used for Approximate Nearest Neighbor (ANN) search due to their efficient graph-based structure. However, in production systems that support dynamic updates (e.g., new data points arriving or existing ones being removed), deleted nodes are sometimes reused (via `allow_replace_deleted_` in hnswlib). This reuse introduces two major problems:
+
+- **Stale Inbound Links**: When a node is deleted and subsequently reused, any existing inbound links from other nodes’ adjacency lists point to a different “state.” These stale references mislead search queries.


Today reuse is not enabled.

yairgott · 2025-02-11T02:27:53Z

rfc/replace-deleted.md

+Hierarchical Navigable Small World (HNSW) graphs are widely used for Approximate Nearest Neighbor (ANN) search due to their efficient graph-based structure. However, in production systems that support dynamic updates (e.g., new data points arriving or existing ones being removed), deleted nodes are sometimes reused (via `allow_replace_deleted_` in hnswlib). This reuse introduces two major problems:
+
+- **Stale Inbound Links**: When a node is deleted and subsequently reused, any existing inbound links from other nodes’ adjacency lists point to a different “state.” These stale references mislead search queries.
+- **Recall Degradation**: Searches that traverse stale edges can produce suboptimal or incorrect nearest-neighbor results—an issue with potentially severe impact on high-recall vector search applications.


I'm uncertain about this given the current implementation, which doesn't reuse deleted nodes. On the other hand, since HNSWlib handles deletion by marking nodes as removed without reclaiming them, a large number of deleted nodes could lead to increased query runtimes.

yairgott · 2025-02-11T02:44:20Z

rfc/replace-deleted.md

+  Each encounter with a stale edge enqueues a repair task (e.g., `(owner_node_id, edge_index)` or `(neighbor_id, stored_version)`).
+
+- **Batch Repairs**:  
+  A background process periodically locks node adjacency lists, removes or fixes stale entries, and moves on.


I think you should provide more details on how the node removal algorithm would work. Specifically:

Handling Removal of an Entry Point at a Higher Level: How would you accommodate the scenario where the removed node is an entry point at a higher layer?

Finding Incoming Links: How would you identify all nodes that have links pointing to the removed node, given that HNSW does not maintain reverse links?

Fixing Stale Entries: Could you clarify what you mean by "stale" entries? Additionally, how would you handle and clean up these entries to maintain graph integrity and avoid performance degradation?

RFC: use versioning for edge repair in HNSW

89c03fc

yairgott requested changes Feb 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: use versioning for edge repair in HNSW #35

RFC: use versioning for edge repair in HNSW #35

vitarb commented Feb 4, 2025

yairgott Feb 8, 2025

yairgott Feb 11, 2025

yairgott Feb 11, 2025

yairgott Feb 11, 2025


		Hierarchical Navigable Small World (HNSW) graphs are widely used for Approximate Nearest Neighbor (ANN) search due to their efficient graph-based structure. However, in production systems that support dynamic updates (e.g., new data points arriving or existing ones being removed), deleted nodes are sometimes reused (via `allow_replace_deleted_` in hnswlib). This reuse introduces two major problems:

		- Stale Inbound Links: When a node is deleted and subsequently reused, any existing inbound links from other nodes’ adjacency lists point to a different “state.” These stale references mislead search queries.

RFC: use versioning for edge repair in HNSW #35

Are you sure you want to change the base?

RFC: use versioning for edge repair in HNSW #35

Conversation

vitarb commented Feb 4, 2025

yairgott Feb 8, 2025

Choose a reason for hiding this comment

yairgott Feb 11, 2025

Choose a reason for hiding this comment

yairgott Feb 11, 2025

Choose a reason for hiding this comment

yairgott Feb 11, 2025

Choose a reason for hiding this comment