Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time dimension in keyspace #15

Open
adammck opened this issue Jun 6, 2022 · 0 comments
Open

Time dimension in keyspace #15

adammck opened this issue Jun 6, 2022 · 0 comments
Labels

Comments

@adammck
Copy link
Owner

adammck commented Jun 6, 2022

The keyspace currently only has a single dimension. Ranges have a start key and an end key and that's it. We do however store the full range history, and include the relevant parts of it with PrepareAddRange commands, so when rebalancing, storage services can choose whether to (a) move existing/historical state around, so that queries can fetch old+new writes from a single replica (and benefit from predicate push-down) or (b) to simply send future writes to their new home, and stitch old+new writes back together at query time. I haven't made an example implementation of the latter yet, but I've tried to be careful to support it.

Both state-moving and time-stitching have their benefits. The former allows reads and writes to be balanced more precisely based on actual utilization, but is complex to implement, and increases load on the destination node (and possibly the source node) while moving things around. The latter allows write load to be quickly and cheaply rebalanced, at the cost of more complexity and constraints at query time; queries must be supported across multiple storage nodes, which can't always be implemented efficiently. Services must pick their poison.

However! Wouldn't it be splendid if services could have it both ways, by splitting ranges as of right now to quickly balance writes, and moving around historical data to balance reads? There is currently no way for Ranger to tell nodes "split range 19 at time: 2022-06-01", leaving two new ranges covering the same key ranges for different time ranges (presumably [-inf, 2022-06-01] and (2022-06-01, +inf]). Much like #14, the current implementation would be a strict subset of this API, since t would simply always be infinity on both ends. So nodes not wanting this could simply tell Ranger not to split on time, and use a Rangelet interface not including that option.

@adammck adammck added the maybe label Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant