Redo #7620: Fix merge command when insert value does not have source distributed column #7627

JelteF · 2024-06-17T13:50:30Z

Related to issue #7619, #7620
Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source.

Example

CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value

Author's Opinion: If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join.

This reverts commit fa4fc0b.

DESCRIPTION: Allow using MERGE in some more situations

…distributed column Related to issue #7619, #7620 Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source. Example ``` CREATE TABLE source (id integer); CREATE TABLE target (id integer ); -- let's distribute both table on id field SELECT create_distributed_table('source', 'id'); SELECT create_distributed_table('target', 'id'); MERGE INTO target t USING ( SELECT 1 AS somekey FROM source WHERE source.id = 1) s ON t.id = s.somekey WHEN NOT MATCHED THEN INSERT (id) VALUES (s.somekey) ERROR: MERGE INSERT must use the source table distribution column value HINT: MERGE INSERT must use the source table distribution column value ``` Author's Opinion: If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column. Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join. This reverts commit fa4fc0b.

codecov · 2024-06-17T13:55:08Z

Codecov Report

Attention: Patch coverage is 50.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 87.49%. Comparing base (fa4fc0b) to head (ab1121c).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7627      +/-   ##
==========================================
- Coverage   89.69%   87.49%   -2.20%     
==========================================
  Files         283      283              
  Lines       60506    60505       -1     
  Branches     7539     7538       -1     
==========================================
- Hits        54269    52939    -1330     
- Misses       4082     5170    +1088     
- Partials     2155     2396     +241

…e source distributed column (citusdata#7627) Related to issue citusdata#7619, citusdata#7620 Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source. Example ``` CREATE TABLE source (id integer); CREATE TABLE target (id integer ); -- let's distribute both table on id field SELECT create_distributed_table('source', 'id'); SELECT create_distributed_table('target', 'id'); MERGE INTO target t USING ( SELECT 1 AS somekey FROM source WHERE source.id = 1) s ON t.id = s.somekey WHEN NOT MATCHED THEN INSERT (id) VALUES (s.somekey) ERROR: MERGE INSERT must use the source table distribution column value HINT: MERGE INSERT must use the source table distribution column value ``` Author's Opinion: If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column. Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join. This reverts commit fa4fc0b. Co-authored-by: paragjain <[email protected]> (cherry picked from commit aaaf637)

…distributed column (#7627) Related to issue #7619, #7620 Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source. Example ``` CREATE TABLE source (id integer); CREATE TABLE target (id integer ); -- let's distribute both table on id field SELECT create_distributed_table('source', 'id'); SELECT create_distributed_table('target', 'id'); MERGE INTO target t USING ( SELECT 1 AS somekey FROM source WHERE source.id = 1) s ON t.id = s.somekey WHEN NOT MATCHED THEN INSERT (id) VALUES (s.somekey) ERROR: MERGE INSERT must use the source table distribution column value HINT: MERGE INSERT must use the source table distribution column value ``` Author's Opinion: If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column. Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join. This reverts commit fa4fc0b. Co-authored-by: paragjain <[email protected]> (cherry picked from commit aaaf637)

JelteF enabled auto-merge (squash) June 17, 2024 13:51

thanodnl approved these changes Jun 17, 2024

View reviewed changes

JelteF disabled auto-merge June 17, 2024 13:59

JelteF enabled auto-merge (squash) June 17, 2024 13:59

JelteF merged commit aaaf637 into main Jun 17, 2024
156 of 157 checks passed

JelteF deleted the redo-7260 branch June 17, 2024 14:07

paragikjain mentioned this pull request Jun 17, 2024

Backporting Two Fixes To Release12.1 #7628

Closed

paragikjain mentioned this pull request Jun 18, 2024

backporting merge fixes to 12.1 release branch #7629

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redo #7620: Fix merge command when insert value does not have source distributed column #7627

Redo #7620: Fix merge command when insert value does not have source distributed column #7627

JelteF commented Jun 17, 2024 •

edited

Loading

codecov bot commented Jun 17, 2024 •

edited

Loading

Redo #7620: Fix merge command when insert value does not have source distributed column #7627

Redo #7620: Fix merge command when insert value does not have source distributed column #7627

Conversation

JelteF commented Jun 17, 2024 • edited Loading

codecov bot commented Jun 17, 2024 • edited Loading

Codecov Report

JelteF commented Jun 17, 2024 •

edited

Loading

codecov bot commented Jun 17, 2024 •

edited

Loading