Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: Bump io.delta:delta-spark_2.12 from 3.1.0 to 3.2.0 #373

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dependabot[bot]
Copy link

@dependabot dependabot bot commented on behalf of github May 12, 2024

Bumps io.delta:delta-spark_2.12 from 3.1.0 to 3.2.0.

Release notes

Sourced from io.delta:delta-spark_2.12's releases.

Delta Lake 3.2.0

We are excited to announce the release of Delta Lake 3.2.0! This release includes several exciting new features.

Highlights

Delta Spark

Delta Spark 3.2.0 is built on Apache Spark™ 3.5. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.

The key features of this release are:

  • Support for Liquid clustering: This allows for incremental clustering based on ZCubes and reduces the write amplification by not touching files already well clustered (i.e., files in stable ZCubes). Users can now use the ALTER TABLE CLUSTER BY syntax to change clustering columns and use the DESCRIBE DETAIL command to check the clustering columns. In addition, Delta Spark now supports DeltaTable clusterBy API in both Python and Scala to allow creating clustered tables using DeltaTable API. See the documentation and examples for more information.
  • Preview support for Type Widening: Delta Spark can now change the type of a column from byte to short to integer using the ALTER TABLE t CHANGE COLUMN col TYPE type command or with schema evolution during MERGE and INSERT operations. The table remains readable by Delta 3.2 readers without requiring the data to be rewritten. For compatibility with older versions, a rewrite of the data can be triggered using the ALTER TABLE t DROP FEATURE 'typeWidening-preview’ command.
    • Note that this feature is in preview and that tables created with this preview feature enabled may not be compatible with future Delta Spark releases.
  • Support for Vacuum Inventory: Delta Spark now extends the VACUUM SQL command to allow users to specify an inventory table in a VACUUM command. When an inventory table is provided, VACUUM will consider the files listed there instead of doing the full listing of the table directory, which can be time consuming for very large tables. See the docs here.
  • Support for Vacuum Writer Protocol Check: Delta Spark can now  support vacuumProtocolCheck ReaderWriter feature which ensures consistent application of reader and writer protocol checks during VACUUM operations, addressing potential protocol discrepancies and mitigating the risk of data corruption due to skipped writer checks.
  • Preview support for In-Commit Timestamps: When enabled, this preview feature persists monotonically increasing timestamps within Delta commits, ensuring they are not affected by file operations. When enabled, time travel queries will yield consistent results, even if the table directory is relocated.
    • Note that this feature is in preview and that tables created with this preview feature enabled may not be compatible with future Delta Spark releases.
  • Deletion Vectors Read Performance Improvements: Two improvements were introduced to DVs in Delta 3.2.
  • Support for Row Tracking: Delta Spark can now write to tables that maintain information that allows identifying rows across multiple versions of a Delta table. Delta Spark can now also access this tracking information using the two metadata fields _metadata.row_id and _metadata.row_commit_version.

Other notable changes include:

  • Delta Sharing: reduce the minimum RPC interval in delta sharing streaming from 30 seconds to 10 seconds
  • Improve the performance of write operations by skipping collecting commit stats
  • New SQL configurations to specify Delta Log cache size (spark.databricks.delta.delta.log.cacheSize) and retention duration (spark.databricks.delta.delta.log.cacheRetentionMinutes)
  • Fix bug in plan validation due to inconsistent field metadata in MERGE
  • Improved metrics during VACUUM for better visibility
  • Hive Metastore schema sync: The truncation threshold for schemas with long fields is now user configurable

Delta Universal Format (UniForm)

Hudi is now supported by Delta Universal format in addition to Iceberg. Writing to a Delta UniForm table can generate Hudi metadata, alongside Delta. This feature is contributed by XTable.

Create a UniForm-enabled that automatically generates Hudi metadata using the following command:

... (truncated)

Commits
  • 4e7a342 Setting version to 3.2.0
  • 03759c9 [3.2][Kernel][Writes] Allow transaction retries for blind append (#3055)
  • 4ae6df6 [3.2][Kernel][Writes] Support idempotent writes (#3051)
  • f365eb0 Add documentation link for Vacuum Protocol Check (#3041)
  • 8cb2e78 [Doc] Type Widening documentation (#3025)
  • 1ba4832 [Spark][3.2] Fix CommitInfo.inCommitTimestamp deserialization for very small ...
  • 6453fe5 [3.2][Kernel][Writes] Add support of inserting data into tables (#3030)
  • fe5d931 [3.2][Kernel][Writes] APIs and impl. for creating new tables (#3016)
  • c8bbd5b [Kernel] Refactor all user-facing exceptions to be "KernelExceptions" (#3014)
  • f4555f5 [Kernel] Remove unused ExpressionHandler.isSupported(...) for now (#3018)
  • Additional commits viewable in compare view

Dependabot compatibility score

You can trigger a rebase of this PR by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Note
Automatic rebases have been disabled on this pull request as it has been open for over 30 days.

Bumps [io.delta:delta-spark_2.12](https://github.com/delta-io/delta) from 3.1.0 to 3.2.0.
- [Release notes](https://github.com/delta-io/delta/releases)
- [Commits](delta-io/delta@v3.1.0...v3.2.0)

---
updated-dependencies:
- dependency-name: io.delta:delta-spark_2.12
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file java Pull requests that update Java code labels May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file java Pull requests that update Java code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants