Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a new TableConfiguration struct to bundle protocol/metadata cross-validation #571

Open
zachschuermann opened this issue Dec 6, 2024 · 8 comments · May be fixed by #644
Open

add a new TableConfiguration struct to bundle protocol/metadata cross-validation #571

zachschuermann opened this issue Dec 6, 2024 · 8 comments · May be fixed by #644
Assignees

Comments

@zachschuermann
Copy link
Collaborator

include (1) schema, (2) protocol, (3) table properties. expose high-level APIs for checking table feature status etc.

@zachschuermann
Copy link
Collaborator Author

also devil's advocate from @nicklan: do we need this? Or should it all just be in Snapshot?

@scovich
Copy link
Collaborator

scovich commented Dec 6, 2024

do we need this? Or should it all just be in Snapshot?

I believe we discussed that at some point, and we realized that transactions (which eventually will need to update this sort of info) would also need a TableConfiguration concept that's not a snapshot?

@zachschuermann
Copy link
Collaborator Author

zachschuermann commented Dec 11, 2024

proposal:

// old Snapshot
pub struct Snapshot {
    pub(crate) table_root: Url,
    pub(crate) log_segment: LogSegment,
    metadata: Metadata,
    protocol: Protocol,
    schema: Schema,
    table_properties: TableProperties,
    pub(crate) column_mapping_mode: ColumnMappingMode,
}

// new Snapshot
pub struct Snapshot {
    pub(crate) table_root: Url,
    pub(crate) log_segment: LogSegment,
    pub(crate) table_config: TableConfiguration
}

// new (pub?) TableConfiguration to wrap up all the 'configuration' of a table and perform all
// the cross-validation between e.g. Protocol and Metadata (table properties) etc.
pub struct TableConfiguration {
    metadata: Metadata,
    protocol: Protocol,
    schema: Schema,
    table_properties: TableProperties,
    column_mapping_mode: ColumnMappingMode, // remove and just compute on-the-fly?
}

// expose a bunch of useful methods to get col mapping mode, DV enablement, etc.
// these have to check both protocol and metadata thus having them all bundled is useful
impl TableConfiguration {
    fn table_properties() -> &TableProperties // etc. to get pieces (or just pub(crate) fields?)
    fn column_mapping_mode() -> ColumnMappingMode { ... }
    ... // more methods to check DV, table features, enablement, etc.
}

@OussamaSaoudi-db
Copy link
Collaborator

I like the proposal 👍

I think TableConfigurations will likely be useful for CDF as well since we need to validate Metadata and Protocol during log replay. Would be nice to check is_cdf_supported(TableConfig) instead of the two separate checks in table_changes/mod.rs.

One thing to note for CDF tho is that we need to be able to incrementally update TableConfiguration with a new Protocol or a new Metadata. This is because a commit could only update Metadata, and have no Protocol actions. This should be easy to pass the data into a newly constructed TableConfiguration, but I just wanted to point that out.

@zachschuermann
Copy link
Collaborator Author

Yea nice thanks - I think this underscores the idea the TableConfiguration can exist (usefully) outside of a Snapshot?

And yea you would do let table_config = TableConfiguration::new(protocol, metadata) and something like table_config.cdf_enabled()

@OussamaSaoudi-db
Copy link
Collaborator

OussamaSaoudi-db commented Dec 14, 2024

Yep! Tho I think we'll need to be able to construct a new one with the old metadata. Ex:

let (Some(protocol), None) = (commit.protocol, commit.metadata) {
   table_config = table_config.with_protocol(protocol)
}

or

let (Some(protocol), None) = (commit.protocol, commit.metadata) {
   let metadata = table_config.metadata();
   table_config = TableConfig::try_new(protocol, metadata)
}

@zachschuermann
Copy link
Collaborator Author

interesting. yea sounds like a case for having that sort of modular API to update a TableConfiguration when you get new protocol or metadata actions.

@OussamaSaoudi
Copy link
Collaborator

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants