-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iceberg-rust Write support #700
Comments
Could I work on supporting table property updates? I'm also interested in working on the commit path, but it's not clear to me which tasks can be started on in parallel with the ones that are currently in progress. |
i am also interested in the write path, any suggestion about is there any issue that independent with the ongoing works and could be worked in parallel? i'd happy to take one of these issues too. |
so do I! rewrite rust is the new trending, community and contributors are truly interested in write support progresses! |
@barronw Sure thing! I've created a separate ticket for it: #730 Feel free to comment on it so I can assign it to you. @flaneur2020 Sounds good, what do you think of the summary generation? See #724 for details |
@barronw sure! Just comment in the one you prefer, then we can assign you |
sorry came up late after a business trip, if there's still open task available, please assign to me 🙏 |
Sure :) |
Iceberg-rust Write support
I've noticed a lot of interest in write support in Iceberg-rust. This issue aims to break this down into smaller pieces so they can be picked up in parallel.
Appetizers
If you're not super familiar with the codebase, feel free to pick up one of the appetizers below. They are, or are not, related to the write path, but are good things to get in, and are good to get to know the code base:
DataFileWriterBuilder
tests #726Commit path
The commit path entails writing a new metadata JSON.
initial-default
#737add_files
.This is done with the Java API where during writing the upper, lower bound are tracked and the number of null- and nan records are counted.. Most of this is in, except theNaN
counts: Implement nan_value_counts && distinct_counts metrics in parquet writer #417Related operations
These are not on the critical path to enable writes, but are related to it:
SchemaUpdate
logic to Iceberg-Rust #697unionByName
to easily union two schemas to provide easy schema evolution: Update a TableSchema from a Schema #698Metadata tables
Metadata tables are used to inspect the table. Having these tables also allows easy implementation of the maintenance procedures since you can easily list all the snapshots, and expire the ones that are older than a certain threshold.
Integration Tests
Integration tests with other engines like spark.
Contribute
If you want to contribute to the upcoming milestone, feel free to comment on this issue. If there is anything unclear or missing, feel free to reach out here as well 👍
The text was updated successfully, but these errors were encountered: