Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redo #7620: Fix merge command when insert value does not have source distributed column #7627

Merged
merged 1 commit into from
Jun 17, 2024

Conversation

JelteF
Copy link
Contributor

@JelteF JelteF commented Jun 17, 2024

Related to issue #7619, #7620
Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source.

Example

CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value

Author's Opinion: If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join.

This reverts commit fa4fc0b.

DESCRIPTION: Allow using MERGE in some more situations

…distributed column

Related to issue #7619, #7620
Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source.

Example
```
CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value
```

Author's Opinion:   If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join.

This reverts commit fa4fc0b.
@JelteF JelteF enabled auto-merge (squash) June 17, 2024 13:51
Copy link

codecov bot commented Jun 17, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 87.49%. Comparing base (fa4fc0b) to head (ab1121c).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7627      +/-   ##
==========================================
- Coverage   89.69%   87.49%   -2.20%     
==========================================
  Files         283      283              
  Lines       60506    60505       -1     
  Branches     7539     7538       -1     
==========================================
- Hits        54269    52939    -1330     
- Misses       4082     5170    +1088     
- Partials     2155     2396     +241     

@JelteF JelteF disabled auto-merge June 17, 2024 13:59
@JelteF JelteF enabled auto-merge (squash) June 17, 2024 13:59
@JelteF JelteF merged commit aaaf637 into main Jun 17, 2024
156 of 157 checks passed
@JelteF JelteF deleted the redo-7260 branch June 17, 2024 14:07
paragikjain pushed a commit to paragikjain/citus that referenced this pull request Jun 17, 2024
…e source distributed column (citusdata#7627)

Related to issue citusdata#7619, citusdata#7620
Merge command fails when source query is single sharded and source and
target are co-located and insert is not using distribution key of
source.

Example
```
CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value
```

Author's Opinion: If join is not between source and target distributed
column, we should not force user to use source distributed column while
inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's
not push down query to workers and don't force user to use source
distributed column if it is not part of join.

This reverts commit fa4fc0b.

Co-authored-by: paragjain <[email protected]>
(cherry picked from commit aaaf637)
paragikjain pushed a commit to paragikjain/citus that referenced this pull request Jun 18, 2024
…e source distributed column (citusdata#7627)

Related to issue citusdata#7619, citusdata#7620
Merge command fails when source query is single sharded and source and
target are co-located and insert is not using distribution key of
source.

Example
```
CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value
```

Author's Opinion: If join is not between source and target distributed
column, we should not force user to use source distributed column while
inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's
not push down query to workers and don't force user to use source
distributed column if it is not part of join.

This reverts commit fa4fc0b.

Co-authored-by: paragjain <[email protected]>
(cherry picked from commit aaaf637)
JelteF added a commit that referenced this pull request Jun 18, 2024
…distributed column (#7627)

Related to issue #7619, #7620
Merge command fails when source query is single sharded and source and
target are co-located and insert is not using distribution key of
source.

Example
```
CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value
```

Author's Opinion: If join is not between source and target distributed
column, we should not force user to use source distributed column while
inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's
not push down query to workers and don't force user to use source
distributed column if it is not part of join.

This reverts commit fa4fc0b.

Co-authored-by: paragjain <[email protected]>
(cherry picked from commit aaaf637)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants