Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: migrate from icelake to iceberg-rust #19887

Merged
merged 3 commits into from
Dec 26, 2024
Merged

Conversation

ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Dec 23, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR migrate icelake to iceberg-rust.
After risingwavelabs/iceberg-rust#10, iceberg-rust can support write, so we can migrate all things from icelake to iceberg-rust.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • My PR contains critical fixes that are necessary to be merged into the latest release.

Documentation

  • My PR needs documentation updates.
Release note

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Dec 23, 2024

We should merge risingwavelabs/iceberg-rust#10 first, and after that, I will rebase this PR and convert it into the normal state.

@Li0k
Copy link
Contributor

Li0k commented Dec 23, 2024

We should merge #10 first, and after that, I will rebase this PR and convert it into the normal state.

Is wrong PR link in the description?

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Dec 23, 2024

We should merge #10 first, and after that, I will rebase this PR and convert it into the normal state.

Is wrong PR link in the description?

Thanks! Good catch!🤣

@ZENOTME ZENOTME force-pushed the zj/migrate_iceberg_rust branch from deb061e to 84d2463 Compare December 23, 2024 06:36
@ZENOTME ZENOTME changed the title feat: migrate from icelake to iceberg-rust (WIP)feat: migrate from icelake to iceberg-rust Dec 23, 2024
@ZENOTME ZENOTME marked this pull request as ready for review December 23, 2024 06:37
@ZENOTME ZENOTME requested a review from a team as a code owner December 23, 2024 06:37
@ZENOTME ZENOTME requested a review from fuyufjh December 23, 2024 06:37
@ZENOTME ZENOTME force-pushed the zj/migrate_iceberg_rust branch 2 times, most recently from b5dac34 to 2a93eba Compare December 23, 2024 16:27
Comment on lines -140 to -161
async fn update_table(self: Arc<Self>, update_table: &UpdateTable) -> icelake::Result<Table> {
execute_with_jni_env(self.jvm, |env| {
let request_str = serde_json::to_string(&CommitTableRequest::try_from(update_table)?)?;

let request_jni_str = env.new_string(&request_str).with_context(|| {
format!("Failed to create jni string from request json: {request_str}.")
})?;

let result_json =
call_method!(env, self.java_catalog.as_obj(), {String updateTable(String)},
&request_jni_str)
.with_context(|| {
format!(
"Failed to update iceberg table: {}",
update_table.table_name()
)
})?;

let rust_json_str = jobj_to_str(env, result_json)?;

let response: CommitTableResponse = serde_json::from_str(&rust_json_str)?;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation should be migrated to iceberg-rust JNI as well.

@graphite-app graphite-app bot requested a review from a team December 24, 2024 06:16
Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM. Thank you so much for this PR!!!

@@ -402,8 +312,57 @@ impl CatalogV2 for JniCatalog {
}

/// Update a table to the catalog.
async fn update_table(&self, _commit: TableCommit) -> iceberg::Result<TableV2> {
todo!()
async fn update_table(&self, mut commit: TableCommit) -> iceberg::Result<TableV2> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableV2 now can be rename to Table

@@ -82,7 +82,7 @@ impl IcebergProperties {
java_catalog_props.insert("jdbc.password".to_owned(), jdbc_password);
}
// TODO: support java_catalog_props for iceberg source
self.common.load_table_v2(&java_catalog_props).await
self.common.load_table(&java_catalog_props).await
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Methods contain v2 can be renamed as well

}

#[allow(clippy::type_complexity)]
enum IcebergWriterDispatch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember we had some discussion on the writer trait. Why do we now use enum instead?

@ZENOTME ZENOTME force-pushed the zj/migrate_iceberg_rust branch from db488e8 to 6ff4d64 Compare December 24, 2024 10:31
@graphite-app graphite-app bot requested a review from a team December 24, 2024 11:41
@ZENOTME ZENOTME changed the title (WIP)feat: migrate from icelake to iceberg-rust feat: migrate from icelake to iceberg-rust Dec 25, 2024
@ZENOTME ZENOTME force-pushed the zj/migrate_iceberg_rust branch from 32d5892 to c942311 Compare December 25, 2024 03:39
@ZENOTME ZENOTME requested review from xxchan and Li0k December 25, 2024 04:35
let table_metadata = response.metadata;

let file_io = FileIO::from_path(&response.metadata_location)?
.with_props(self.config.table_io_configs.iter())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we don't need self.config (BaseCatalogConfig is a struct from icelake) anymore, we can simplify it to a map.

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work!!

BTW I think splitting this PR into smaller ones (separate refactor and feat) can make it easier to review more carefully:

  1. migrate sink to iceberg-rust, without touching legacy code
  2. deleting icelake code
  3. rename *_v2 to *

But I'm also ok to merge it as long as ci passes since iceberg connector is under heavy development status. 🤪

@chenzl25 chenzl25 added this pull request to the merge queue Dec 26, 2024
@chenzl25
Copy link
Contributor

Let's move fast and get rid of icelake as soon as possible, because right now we have some known issues can't be resolved in icelake (like statistics). By migrating to iceberg-rust, we can support more complex nested data type for iceberg engine table, google cloud store integration, and so on.

Merged via the queue into main with commit fa5e442 Dec 26, 2024
32 of 33 checks passed
@chenzl25 chenzl25 deleted the zj/migrate_iceberg_rust branch December 26, 2024 06:03
@ZENOTME
Copy link
Contributor Author

ZENOTME commented Dec 26, 2024

Thanks @chenzl25 ! I will send PR to get rid of all icelake later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants