[Fix] merge command when insert value does not have source distributed column #7620

paragikjain · 2024-06-07T16:47:24Z

Related to issue #7619
Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source.

Example

CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value

Author's Opinion: If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join.

tejeswarm

LGTM

src/backend/distributed/planner/merge_planner.c

src/test/regress/sql/merge.sql

codecov · 2024-06-13T12:26:29Z

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Project coverage is 73.22%. Comparing base (8c9de08) to head (16e4e15).

❗ Current head 16e4e15 differs from pull request most recent head 449c850

Please upload reports for the commit 449c850 to get more accurate results.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #7620       +/-   ##
===========================================
- Coverage   89.69%   73.22%   -16.47%     
===========================================
  Files         283      283               
  Lines       60506    60493       -13     
  Branches     7539     7537        -2     
===========================================
- Hits        54270    44296     -9974     
- Misses       4083    13517     +9434     
- Partials     2153     2680      +527

rajeshkt78

Looks good.

This reverts commit 89f7217.

…gjain/mergeFix

Because we want to track PR numbers and to make backporting easy we (pretty much always) use squash-merges when merging to master. We accidentally used a rebase merge for PR #7620. This reverts those changes so we can redo the merge using squash merge.

Because we want to track PR numbers and to make backporting easy we (pretty much always) use squash-merges when merging to master. We accidentally used a rebase merge for PR #7620. This reverts those changes so we can redo the merge using squash merge. This reverts all commits from eedb607 to 9e71750.

…distributed column Related to issue #7619, #7620 Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source. Example ``` CREATE TABLE source (id integer); CREATE TABLE target (id integer ); -- let's distribute both table on id field SELECT create_distributed_table('source', 'id'); SELECT create_distributed_table('target', 'id'); MERGE INTO target t USING ( SELECT 1 AS somekey FROM source WHERE source.id = 1) s ON t.id = s.somekey WHEN NOT MATCHED THEN INSERT (id) VALUES (s.somekey) ERROR: MERGE INSERT must use the source table distribution column value HINT: MERGE INSERT must use the source table distribution column value ``` Author's Opinion: If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column. Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join. This reverts commit fa4fc0b.

…distributed column Related to issue #7619, #7620 Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source. Example ``` CREATE TABLE source (id integer); CREATE TABLE target (id integer ); -- let's distribute both table on id field SELECT create_distributed_table('source', 'id'); SELECT create_distributed_table('target', 'id'); MERGE INTO target t USING ( SELECT 1 AS somekey FROM source WHERE source.id = 1) s ON t.id = s.somekey WHEN NOT MATCHED THEN INSERT (id) VALUES (s.somekey) ERROR: MERGE INSERT must use the source table distribution column value HINT: MERGE INSERT must use the source table distribution column value ``` Author's Opinion: If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column. Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join. This reverts commit fa4fc0b. Co-Authored-By: paragjain <[email protected]>

…distributed column Related to issue #7619, #7620 Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source. Example ``` CREATE TABLE source (id integer); CREATE TABLE target (id integer ); -- let's distribute both table on id field SELECT create_distributed_table('source', 'id'); SELECT create_distributed_table('target', 'id'); MERGE INTO target t USING ( SELECT 1 AS somekey FROM source WHERE source.id = 1) s ON t.id = s.somekey WHEN NOT MATCHED THEN INSERT (id) VALUES (s.somekey) ERROR: MERGE INSERT must use the source table distribution column value HINT: MERGE INSERT must use the source table distribution column value ``` Author's Opinion: If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column. Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join. This reverts commit fa4fc0b.

…distributed column (#7627) Related to issue #7619, #7620 Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source. Example ``` CREATE TABLE source (id integer); CREATE TABLE target (id integer ); -- let's distribute both table on id field SELECT create_distributed_table('source', 'id'); SELECT create_distributed_table('target', 'id'); MERGE INTO target t USING ( SELECT 1 AS somekey FROM source WHERE source.id = 1) s ON t.id = s.somekey WHEN NOT MATCHED THEN INSERT (id) VALUES (s.somekey) ERROR: MERGE INSERT must use the source table distribution column value HINT: MERGE INSERT must use the source table distribution column value ``` Author's Opinion: If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column. Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join. This reverts commit fa4fc0b. Co-authored-by: paragjain <[email protected]>

…e source distributed column (citusdata#7627) Related to issue citusdata#7619, citusdata#7620 Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source. Example ``` CREATE TABLE source (id integer); CREATE TABLE target (id integer ); -- let's distribute both table on id field SELECT create_distributed_table('source', 'id'); SELECT create_distributed_table('target', 'id'); MERGE INTO target t USING ( SELECT 1 AS somekey FROM source WHERE source.id = 1) s ON t.id = s.somekey WHEN NOT MATCHED THEN INSERT (id) VALUES (s.somekey) ERROR: MERGE INSERT must use the source table distribution column value HINT: MERGE INSERT must use the source table distribution column value ``` Author's Opinion: If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column. Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join. This reverts commit fa4fc0b. Co-authored-by: paragjain <[email protected]> (cherry picked from commit aaaf637)

…distributed column (#7627) Related to issue #7619, #7620 Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source. Example ``` CREATE TABLE source (id integer); CREATE TABLE target (id integer ); -- let's distribute both table on id field SELECT create_distributed_table('source', 'id'); SELECT create_distributed_table('target', 'id'); MERGE INTO target t USING ( SELECT 1 AS somekey FROM source WHERE source.id = 1) s ON t.id = s.somekey WHEN NOT MATCHED THEN INSERT (id) VALUES (s.somekey) ERROR: MERGE INSERT must use the source table distribution column value HINT: MERGE INSERT must use the source table distribution column value ``` Author's Opinion: If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column. Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join. This reverts commit fa4fc0b. Co-authored-by: paragjain <[email protected]> (cherry picked from commit aaaf637)

merge command fix

98a2dac

paragikjain changed the title ~~merge command fix~~ [Fix] merge command when insert value does not have source distributed column Jun 7, 2024

adding update and delete tests

c9ada85

tejeswarm approved these changes Jun 13, 2024

View reviewed changes

fix some indent

fe7be06

rajeshkt78 reviewed Jun 13, 2024

View reviewed changes

src/backend/distributed/planner/merge_planner.c Show resolved Hide resolved

rajeshkt78 reviewed Jun 13, 2024

View reviewed changes

src/backend/distributed/planner/merge_planner.c Show resolved Hide resolved

rajeshkt78 reviewed Jun 13, 2024

View reviewed changes

src/test/regress/sql/merge.sql Show resolved Hide resolved

rajeshkt78 reviewed Jun 13, 2024

View reviewed changes

src/test/regress/sql/merge.sql Show resolved Hide resolved

LordParag added 2 commits June 14, 2024 04:35

some more

7a7dc19

some more

21f8982

rajeshkt78 approved these changes Jun 14, 2024

View reviewed changes

JelteF and others added 7 commits June 14, 2024 14:15

Try to fix failure

89f7217

Revert "Try to fix failure"

517c2c6

This reverts commit 89f7217.

Hopefully fix issue

6b6cf80

Merge branch 'main' of https://github.com/paragikjain/citus into para…

99921e3

…gjain/mergeFix

removing flakyness from test

16e4e15

some more

ed23bb9

fixing flakyness in test

449c850

tejeswarm merged commit 9e71750 into citusdata:main Jun 15, 2024
154 of 155 checks passed

paragikjain mentioned this pull request Jun 17, 2024

Parag/hotfix #7625

Closed

JelteF added a commit that referenced this pull request Jun 17, 2024

Revert rebase merge of #7620

a2e53f3

JelteF mentioned this pull request Jun 17, 2024

Revert rebase merge of #7620 #7626

Merged

JelteF mentioned this pull request Jun 17, 2024

Redo #7620: Fix merge command when insert value does not have source distributed column #7627

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] merge command when insert value does not have source distributed column #7620

[Fix] merge command when insert value does not have source distributed column #7620

paragikjain commented Jun 7, 2024 •

edited

Loading

tejeswarm left a comment

codecov bot commented Jun 13, 2024 •

edited

Loading

rajeshkt78 left a comment

[Fix] merge command when insert value does not have source distributed column #7620

[Fix] merge command when insert value does not have source distributed column #7620

Conversation

paragikjain commented Jun 7, 2024 • edited Loading

tejeswarm left a comment

Choose a reason for hiding this comment

codecov bot commented Jun 13, 2024 • edited Loading

Codecov Report

rajeshkt78 left a comment

Choose a reason for hiding this comment

paragikjain commented Jun 7, 2024 •

edited

Loading

codecov bot commented Jun 13, 2024 •

edited

Loading