Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] merge command when insert value does not have source distributed column #7620

Merged
merged 12 commits into from
Jun 15, 2024

Conversation

paragikjain
Copy link
Contributor

@paragikjain paragikjain commented Jun 7, 2024

Related to issue #7619
Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source.

Example

CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value

Author's Opinion: If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join.

@paragikjain paragikjain changed the title merge command fix [Fix] merge command when insert value does not have source distributed column Jun 7, 2024
Copy link
Contributor

@tejeswarm tejeswarm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

codecov bot commented Jun 13, 2024

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Project coverage is 73.22%. Comparing base (8c9de08) to head (16e4e15).

Current head 16e4e15 differs from pull request most recent head 449c850

Please upload reports for the commit 449c850 to get more accurate results.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #7620       +/-   ##
===========================================
- Coverage   89.69%   73.22%   -16.47%     
===========================================
  Files         283      283               
  Lines       60506    60493       -13     
  Branches     7539     7537        -2     
===========================================
- Hits        54270    44296     -9974     
- Misses       4083    13517     +9434     
- Partials     2153     2680      +527     

Copy link
Contributor

@rajeshkt78 rajeshkt78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@tejeswarm tejeswarm merged commit 9e71750 into citusdata:main Jun 15, 2024
154 of 155 checks passed
@paragikjain paragikjain mentioned this pull request Jun 17, 2024
JelteF added a commit that referenced this pull request Jun 17, 2024
JelteF added a commit that referenced this pull request Jun 17, 2024
Because we want to track PR numbers and to make backporting easy we
(pretty much always) use squash-merges when merging to master. We
accidentally used a rebase merge for PR #7620. This reverts those
changes so we can redo the merge using squash merge.
JelteF added a commit that referenced this pull request Jun 17, 2024
Because we want to track PR numbers and to make backporting easy we
(pretty much always) use squash-merges when merging to master. We
accidentally used a rebase merge for PR #7620. This reverts those
changes so we can redo the merge using squash merge.

This reverts all commits from eedb607 to 9e71750.
JelteF added a commit that referenced this pull request Jun 17, 2024
…distributed column

Related to issue #7619, #7620
Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source.

Example
```
CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value
```

Author's Opinion:   If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join.

This reverts commit fa4fc0b.
JelteF added a commit that referenced this pull request Jun 17, 2024
…distributed column

Related to issue #7619, #7620
Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source.

Example
```
CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value
```

Author's Opinion:   If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join.

This reverts commit fa4fc0b.

Co-Authored-By: paragjain <[email protected]>
JelteF pushed a commit that referenced this pull request Jun 17, 2024
…distributed column

Related to issue #7619, #7620
Merge command fails when source query is single sharded and source and target are co-located and insert is not using distribution key of source.

Example
```
CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value
```

Author's Opinion:   If join is not between source and target distributed column, we should not force user to use source distributed column while inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's not push down query to workers and don't force user to use source distributed column if it is not part of join.

This reverts commit fa4fc0b.
JelteF added a commit that referenced this pull request Jun 17, 2024
…distributed column (#7627)

Related to issue #7619, #7620
Merge command fails when source query is single sharded and source and
target are co-located and insert is not using distribution key of
source.

Example
```
CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value
```

Author's Opinion: If join is not between source and target distributed
column, we should not force user to use source distributed column while
inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's
not push down query to workers and don't force user to use source
distributed column if it is not part of join.

This reverts commit fa4fc0b.

Co-authored-by: paragjain <[email protected]>
paragikjain pushed a commit to paragikjain/citus that referenced this pull request Jun 17, 2024
…e source distributed column (citusdata#7627)

Related to issue citusdata#7619, citusdata#7620
Merge command fails when source query is single sharded and source and
target are co-located and insert is not using distribution key of
source.

Example
```
CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value
```

Author's Opinion: If join is not between source and target distributed
column, we should not force user to use source distributed column while
inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's
not push down query to workers and don't force user to use source
distributed column if it is not part of join.

This reverts commit fa4fc0b.

Co-authored-by: paragjain <[email protected]>
(cherry picked from commit aaaf637)
paragikjain pushed a commit to paragikjain/citus that referenced this pull request Jun 18, 2024
…e source distributed column (citusdata#7627)

Related to issue citusdata#7619, citusdata#7620
Merge command fails when source query is single sharded and source and
target are co-located and insert is not using distribution key of
source.

Example
```
CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value
```

Author's Opinion: If join is not between source and target distributed
column, we should not force user to use source distributed column while
inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's
not push down query to workers and don't force user to use source
distributed column if it is not part of join.

This reverts commit fa4fc0b.

Co-authored-by: paragjain <[email protected]>
(cherry picked from commit aaaf637)
JelteF added a commit that referenced this pull request Jun 18, 2024
…distributed column (#7627)

Related to issue #7619, #7620
Merge command fails when source query is single sharded and source and
target are co-located and insert is not using distribution key of
source.

Example
```
CREATE TABLE source (id integer);
CREATE TABLE target (id integer );

-- let's distribute both table on id field
SELECT create_distributed_table('source', 'id');
SELECT create_distributed_table('target', 'id');

MERGE INTO target t
  USING ( SELECT 1 AS somekey
          FROM source
        WHERE source.id = 1) s
  ON t.id = s.somekey
  WHEN NOT MATCHED
  THEN INSERT (id)
    VALUES (s.somekey)

ERROR:  MERGE INSERT must use the source table distribution column value
HINT:  MERGE INSERT must use the source table distribution column value
```

Author's Opinion: If join is not between source and target distributed
column, we should not force user to use source distributed column while
inserting value of target distributed column.

Fix: If user is not using distributed key of source for insertion let's
not push down query to workers and don't force user to use source
distributed column if it is not part of join.

This reverts commit fa4fc0b.

Co-authored-by: paragjain <[email protected]>
(cherry picked from commit aaaf637)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants