-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple building and serialization #778
Comments
This commit changes a couple function signatures that effectively allow the user to differentiate where metadata bytes get written to and what's contained in the metadata type. For more info on why see: <apache#778>
Hey @Sl1mb0, Thanks for raising this issue. If I understand correctly if you could provide your own implementation of FileIO, would you be able to make it work? This would avoid the copy. Iceberg works with absolute paths that are immutable after being written. This is very important for merge-on-read operations such as positional deletes. |
Hmm - this may work, but it's hard to say given it's not clear what the
I understand that, but in my opinion that is the responsibility of a 'higher-level' layer above building and serialization. Ideally you would have |
I should clarify: I would be happy do this work at some point but would like to know if it would be accepted. |
Hi, @Sl1mb0 Thanks for raising this! If I understand correctly, you are asking to parts:
So that user can control where to store these metadata? |
At the moment, the building and serialization of Iceberg metadata is coupled together.
For example, let's say I want to build a
ManifestFile
that I then add to aManifestList
:(some code has not been included for the sake of brevity)
ManifestFile
you have to 'write' aManifest
.ManifestFile
to - that location is where theManifestFile
gets written to and is included in the metadata of thatManifestFile
ManifestFile
is added to aManifestList
, the location of theManifestFile
is what's used to 'point' theManifestList
to thatManifestFile
FileIO
/OutputFile
/InputFile
type to write to their preferred storage layer instead of allowing the user to build/use their own abstractions for "where the bytes get written to"FileIO
In the above example the
ManifestList
andManifestFile
were built/serialized onNode A
and then copied over toNode B
but because the building/serialization was performed onNode A
- theManifestList
onNode B
points to theManifestFile
onNode A
. _this happens because the location of where theManifestFile
was written to was included in it's metadata; and that metadata informs theManifestList
where thatManifestFile
is.The text was updated successfully, but these errors were encountered: