Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add information from written files to Iceberg conflict detection #24470

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

pajaks
Copy link
Member

@pajaks pajaks commented Dec 13, 2024

Description

Currently, Iceberg's concurrent write conflict detection is based on predicates received from the engine. In more complicated cases (like joins, merges, or different type comparisons), this information is not available at the connector level.

This PR aims to take partition information from the actual written files as a source for the conflict detection system. If, during a write, created files are only for some partitions, we can check only those partitions for potential conflicts from concurrent writes.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Dec 13, 2024
@github-actions github-actions bot added the iceberg Iceberg connector label Dec 13, 2024
@ebyhr ebyhr self-requested a review December 16, 2024 09:56
@pajaks pajaks force-pushed the pajaks/iceberg_concurent_merge branch from 078a427 to 9d8bfd3 Compare December 16, 2024 12:05
@pajaks pajaks requested a review from findinpath December 16, 2024 12:38
@pajaks pajaks force-pushed the pajaks/iceberg_concurent_merge branch 4 times, most recently from 0bc5d55 to 44dd035 Compare December 19, 2024 13:23
@pajaks pajaks force-pushed the pajaks/iceberg_concurent_merge branch 2 times, most recently from 7f3392a to 08e0dd7 Compare December 23, 2024 14:15
Map<IcebergColumnHandle, Domain> domainsFromTasks = new HashMap<>();
for (CommitTaskData commitTask : commitTasks) {
PartitionSpec taskPartitionSpec = PartitionSpecParser.fromJson(schema, commitTask.partitionSpecJson());
if (commitTask.partitionDataJson().isEmpty() || taskPartitionSpec.isUnpartitioned() || !taskPartitionSpec.equals(partitionSpec)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can potentially use something like io.trino.plugin.iceberg.IcebergSplitSource#createFileStatisticsDomain for un-partitioned columns. Not necessary for current PR though.

@pajaks pajaks force-pushed the pajaks/iceberg_concurent_merge branch 2 times, most recently from c8232a1 to 8e73d7e Compare January 9, 2025 13:23
@pajaks pajaks requested review from raunaqmorarka and ebyhr January 9, 2025 13:57
Copy link

This pull request has gone a while without any activity. Tagging for triage help: @mosabua

@github-actions github-actions bot added the stale label Jan 31, 2025
@pajaks pajaks force-pushed the pajaks/iceberg_concurent_merge branch from 8e73d7e to 2158554 Compare February 6, 2025 10:57
@pajaks pajaks force-pushed the pajaks/iceberg_concurent_merge branch from 2158554 to 31f23c4 Compare February 6, 2025 11:10
@pajaks pajaks requested a review from raunaqmorarka February 6, 2025 11:10
@github-actions github-actions bot removed the stale label Feb 6, 2025
@pajaks pajaks force-pushed the pajaks/iceberg_concurent_merge branch from 31f23c4 to 4db2b70 Compare February 7, 2025 09:21
@pajaks pajaks force-pushed the pajaks/iceberg_concurent_merge branch from 4db2b70 to a4bad0c Compare February 10, 2025 09:48
@pajaks pajaks force-pushed the pajaks/iceberg_concurent_merge branch from a4bad0c to 8e7d1c0 Compare February 10, 2025 09:51
@ebyhr
Copy link
Member

ebyhr commented Feb 10, 2025

/test-with-secrets sha=8e7d1c0799730d7da18522cc142b023193d22848

Copy link

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/13238104125

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

4 participants