-
-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proximity Map implementation with support for incremental edits. #8686
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
nicktobey
force-pushed
the
nicktobey/proximity-map2
branch
from
December 17, 2024 20:27
43fd1e4
to
08f51c5
Compare
This was referenced Dec 17, 2024
nicktobey
force-pushed
the
nicktobey/proximity-map2
branch
3 times, most recently
from
January 3, 2025 20:16
bff6950
to
a189e19
Compare
@nicktobey DOLT
|
@nicktobey DOLT
|
@coffeegoddd DOLT
|
@nicktobey DOLT
|
nicktobey
force-pushed
the
nicktobey/proximity-map2
branch
from
January 3, 2025 23:05
7f6b0fc
to
e712abf
Compare
@nicktobey DOLT
|
nicktobey
force-pushed
the
nicktobey/proximity-map2
branch
from
January 4, 2025 01:01
e712abf
to
3d20dd6
Compare
@nicktobey DOLT
|
nicktobey
force-pushed
the
nicktobey/proximity-map2
branch
from
January 6, 2025 01:22
3d20dd6
to
eea16a4
Compare
@nicktobey DOLT
|
…ncy on github.com/esote/minmaxheap)
@nicktobey DOLT
|
@nicktobey DOLT
|
… same hash as if they were made directly.
@nicktobey DOLT
|
@coffeegoddd DOLT
|
…om the parent, so cap the tree level used when rebuilding the subtree.
@nicktobey DOLT
|
@coffeegoddd DOLT
|
@nicktobey DOLT
|
@nicktobey DOLT
|
@nicktobey DOLT
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Based on #8408, now with additional functionality for incremental changes to indexes.
This is a large-scale PR merging several features into main, all designed for supporting vector indexes.
Vector Index Nodes
1defec9 adds a new message/node type: the vector index node. This message stores a node in a Merkle tree index whose structure is based on some distance measure in a multi-dimensional space: at each level, keys are arranged such that a key is closer to its parent key than any other key in the parent node.
One consequence of this design is that it's not possible to put a hard limit on the number of keys contained in each node. We can control the mean node size, but there's always a non-zero chance that a node will be large enough to break our usual encoding scheme (which uses 16-bit ints to store message offsets). To address this, the vector index node uses 32-bit ints to store message offsets instead of the 16 bits used by other node types.
Proximity Map
A ProximityMap is a new implementation of Dolt's Map, a data structure built on Merkle trees that maps key bytestrings to value bytestrings. The ProximityMap is backed by a tree of vector index nodes, allowing it to perform an approximate nearest neighbor search.
Proximity Maps resemble other Prolly Maps, but have the following invariants:
Notably, while the keys of an individual node are sorted, walking all of a vector indexes keys in standard iteration order will not be sorted.
28b7065 and 6b91635 contain the bulk of the ProximityMap implementation.
The bulk of the changes are in these three commits. Each of the other commits is a smaller self-contained change necessary to support vector indexes.