-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK] Support CTAS from a table with a deleted default value #4142
base: master
Are you sure you want to change the base?
[SPARK] Support CTAS from a table with a deleted default value #4142
Conversation
val deltaLog = DeltaLog.forTable(spark, TableIdentifier(tableName)) | ||
val protocol = deltaLog.unsafeVolatileSnapshot.protocol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid the unsafeVolatileSnapshot
use
val deltaLog = DeltaLog.forTable(spark, TableIdentifier(tableName)) | |
val protocol = deltaLog.unsafeVolatileSnapshot.protocol | |
val (_, snapshot) = DeltaLog.forTableWithSnapshot(spark, TableIdentifier(tableName)) | |
val protocol = snapshot.protocol |
} | ||
|
||
assert(defaultsTableFeatureEnabled("test_table")) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so this EXISTS_DEFAULT_COLUMN_METADATA_KEY
is applied retroactively on data IIUC. So it's not stored in the parquet files.
The semantics of CTAS are such that we want "a target table produced from reading the data of the source" which may not necessarily have the same target schema. I think we should have some data in this table to test that by removing the metadata where we are we still end up with a consistent target table.
We should also test when the column is added after data already exist via alter table, since it seems to go through a different logic. We should assert on the metadata we expect in the schema (CURRENT_DEFAULT_COLUMN_METADATA_KEY, EXISTS_DEFAULT_COLUMN_METADATA_KEY).
schema.exists( | ||
_.metadata.contains(ResolveDefaultColumnsUtils.EXISTS_DEFAULT_COLUMN_METADATA_KEY)) | ||
} | ||
val metadataForOp = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic could use a comment to explain why we're doing it this way. IIUC we need to preserve the metadata on the source because the defaults are applied on read retroactively in the "exists default" case? So we keep the metadata on the source and we remove it from the target table, which makes it ineligible for DeltaTableUtils.removeInternalWriterMetadata
and ColumnWithDefaultExprUtils.removeDefaultExpressions
.
val metadataForOp = | ||
if (schemaContainsExistsColumnMetadata(txn.metadata.schema)) { | ||
val newSchema = SchemaMergingUtils.transformColumns(schema) { | ||
case (_, field: StructField, _) if field.metadata.contains( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment that default is only available as a top level field?
Which Delta project/connector is this regarding?
Description
Default column values are part of the schema values, also after the default value has actually been removed because otherwise data would need to be backfilled.
Default values for removed defaults are stored in the schema metadata field EXISTS_DEFAULT_COLUMN_METADATA_KEY.
In this PR we remove this field in handleCreateTableAsSelect, because CTAS will actually materialize and hence backfill all the values, so we do not need it any more on the new table.
How was this patch tested?
Added new tests
Does this PR introduce any user-facing changes?
Previously it was not possible to CTAS from a table with a removed default without setting the delta.feature.allowColumnDefaults even though no default value was actually part of the new table. this is now possible.