-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let writers specify the chunk size they want #26
base: pkhuong/podman
Are you sure you want to change the base?
Conversation
8385db1
to
c51f61d
Compare
16d1f9a
to
5dea95c
Compare
…nitial Manifest has grown pretty big, as flagged by clippy. TESTED=it builds.
The code so far assumes that readers recognize the fingerprint for a full-sized chunks that's all zero filled. The chunk size is currently constant, 64 KiB. Decouple that constant from the read and write side logic, since it's really become part of the replication protocol. TESTED=partial test.sh
…buf index 10) Still with a default of 64 KB, except with a differently named constant. The (write) tracker's SNAPSHOT_GRANULARITY constant is now WRITE_SNAPSHOT_GRANULARITY, and we can see in the patch that we only had to update write-side code, or constants for performance heuristics. On the write side, we stick in the default `None`: this lets us (temporarily) test what would happen if a new reader accessed old manifests without this new field. TESTED=partial test.sh
Hide WRITE_SNAPSHOT_GRANULARITY behind a function, and rename the constant to `*DEFAULT_*WRITE_SNAPSHOT_GRANULARITY`, to enforce using the function. The Tracker object now keeps track of its snapshot granularity (i.e., chunk size, except for the last chunk which may be shorter) as a field... and pipes it all the way to the serialised ManifestV1. Manual inspection shows that the remaining reference changes (a lot of which also transitively turned into functions) are for performance heuristics (copier.rs, unzstd.rs), and thus also apply the current default as the min value, to avoid acting too weird when the actual chunk size is eventually configured much smaller than expected. TESTED=it builds
with the VERNEUIL_WRITE_SNAPSHOT_GRANULARITY environment variable. The value must be strictly positive, and ideally not too small; the code currently imposes a min value of 64 bytes. Currently an environment variable because there's no clear need for making this configurable via json, but shouldn't be too hard to fix: the Tracker object tracks the current value in an instance field. TESTED=test.sh, w/o size override and w/ export VERNEUIL_WRITE_SNAPSHOT_GRANULARITY={16384,196608}`
c51f61d
to
0678f43
Compare
I'll try to get back to this if @jpihlaja-bt doesn't beat me to it. What is the purpose, though? I don't understand what we gain unless it's that you want to change the sql page size and are hoping that chunks match it. |
There's already only a weak relationship between SQLite's page size and Verneuil's chunk size: we must do the right thing for SQLite page size smaller than 64 KB (e.g., the default value). Making the chunk size tweakable (usually larger, I expect) lets us improve the aggregate cost of operations when IOPS are more expensive than storage; that's particularly interesting for read-heavy workloads on mostly static data, where we really want megabyte-sized chunks. I think the final state might look like a special case for the root page, and smaller pages for B-tree leaves (e.g., after the first ~ sqrt(n) bytes of data), but a single tun-able is already useful. |
The first two commits are small cleanups.
The next commit adds an optional
base_chunk_size
field in the ManifestV1 proto. When this new field is populated, it holds the base chunk size for the chunks referenced by that manifest: all but the last chunk must have that size (the final chunk may be smaller). This new proto field has the unused protobuf index 10.Given this infrastructure, it's easy to make the writer's chunk size configurable (with an environment variable for now), and to pipe that to the manifest.
Tested with
t/test.sh
(i.e., large chunks of the sqlite test suite with validation of the replicated data).