Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Fix] Incorrect Query Results with Distributed Query Plans. citusdata#7698 and citusdata#7697 #7810

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

codeforall
Copy link

This PR addresses two reported issues: #7698 and #7697.

Both issues involve incorrect query results when a query references both local and distributed tables, and includes a WHERE EXISTS clause on the local table.

For example:
SELECT ...
WHERE EXISTS (SELECT * FROM local_table);

In such cases, the WHERE EXISTS clause typically generates an InitPlan or "one-time filter," which determines whether the rest of the plan's output qualifies for the result. If this InitPlan relies on the contents of a local table, it must be executed locally on the coordinator. However, the planner's decisions regarding whether to convert local or distributed tables into intermediate results fail to account for the references within the InitPlan. This results in an incorrect query execution plan and, subsequently, incorrect data.

This PR ensures that when the standard planner (standard_planner) generates an InitPlan in the PlannedStmt, we check the executor parameters (PARAM nodes) in the join qualifiers for relations referenced by the InitPlan. If such references exist, distributed table references are converted to intermediate results rather than local tables. This adjustment ensures that local tables used in the InitPlan remain intact and behave as expected.

This fix prevents incorrect query results in cases involving mixed local and distributed tables with WHERE EXISTS clauses and improves the accuracy of distributed query planning.

This PR addresses two reported issues: citusdata#7698 and citusdata#7697.

The Problem:
Both issues involve incorrect query results when a query references both local
and distributed tables, and includes a WHERE EXISTS clause on the local table.

For example:
SELECT ...
WHERE EXISTS (SELECT * FROM local_table);

In such cases, the WHERE EXISTS clause typically generates an InitPlan
or "one-time filter," which determines whether the rest of the plan's output
qualifies for the result. If this InitPlan relies on the contents of a
local table, it must be executed locally on the coordinator. However, the
planner's decisions regarding whether to convert local or distributed tables
into intermediate results fail to account for the references within the InitPlan.
This results in an incorrect query execution plan and, subsequently, incorrect data.

The Fix:
This PR ensures that when the standard planner (standard_planner) generates an
InitPlan in the PlannedStmt, we check the executor parameters (PARAM nodes) in
the join qualifiers for relations referenced by the InitPlan. If such references
exist, distributed table references are converted to intermediate results rather
than local tables. This adjustment ensures that local tables used in the
InitPlan remain intact and behave as expected.

This fix prevents incorrect query results in cases involving mixed local and
distributed tables with WHERE EXISTS clauses and improves the accuracy of
distributed query planning.
@codeforall
Copy link
Author

@microsoft-github-policy-service agree company="Microsoft"

Copy link

codecov bot commented Dec 25, 2024

Codecov Report

Attention: Patch coverage is 0% with 32 lines in your changes missing coverage. Please review.

Project coverage is 26.11%. Comparing base (7341191) to head (792f59b).

❗ There is a different number of reports uploaded between BASE (7341191) and HEAD (792f59b). Click for more details.

HEAD has 80 uploads less than BASE
Flag BASE (7341191) HEAD (792f59b)
_upgrade 19 0
14_regress_check-pytest 1 0
16_regress_check-pytest 1 0
14_16_upgrade 1 0
14_regress_check-columnar-isolation 1 0
15_16_upgrade 1 0
14_15_upgrade 1 0
14_regress_check-follower-cluster 1 0
15_regress_check-columnar-isolation 1 0
15_regress_check-follower-cluster 1 0
14_regress_check-query-generator 1 0
14_regress_check-split 1 0
14_regress_check-enterprise-isolation-logicalrep-2 1 0
15_regress_check-enterprise-failure 1 0
14_regress_check-columnar 1 0
16_regress_check-columnar 1 0
14_regress_check-enterprise-failure 1 0
16_regress_check-columnar-isolation 1 0
16_regress_check-query-generator 1 0
16_regress_check-enterprise-isolation-logicalrep-3 1 0
16_regress_check-enterprise-isolation-logicalrep-2 1 0
15_regress_check-columnar 1 0
16_regress_check-follower-cluster 1 0
15_regress_check-enterprise-isolation-logicalrep-3 1 0
15_regress_check-enterprise-isolation-logicalrep-2 1 0
14_regress_check-enterprise-isolation-logicalrep-3 1 0
16_regress_check-enterprise-failure 1 0
15_regress_check-query-generator 1 0
14_regress_check-enterprise 1 0
15_regress_check-vanilla 1 0
16_regress_check-vanilla 1 0
15_regress_check-enterprise 1 0
15_regress_check-split 1 0
14_regress_check-vanilla 1 0
16_regress_check-split 1 0
16_regress_check-enterprise 1 0
16_regress_check-enterprise-isolation 1 0
16_regress_check-multi-mx 1 0
14_regress_check-enterprise-isolation 1 0
16_cdc_installcheck 1 0
15_regress_check-enterprise-isolation-logicalrep-1 1 0
15_cdc_installcheck 1 0
16_regress_check-enterprise-isolation-logicalrep-1 1 0
16_regress_check-operations 1 0
14_regress_check-operations 1 0
15_regress_check-operations 1 0
14_regress_check-isolation 1 0
14_regress_check-failure 1 0
16_regress_check-failure 1 0
15_regress_check-failure 1 0
15_regress_check-multi-mx 1 0
14_regress_check-multi-mx 1 0
15_regress_check-isolation 1 0
16_regress_check-isolation 1 0
14_regress_check-multi 1 0
15_regress_check-multi 1 0
16_regress_check-multi 1 0
16_regress_check-multi-1 1 0
15_regress_check-multi-1 1 0
14_regress_check-multi-1 1 0
15_regress_check-enterprise-isolation 1 0
14_regress_check-enterprise-isolation-logicalrep-1 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #7810       +/-   ##
===========================================
- Coverage   89.70%   26.11%   -63.59%     
===========================================
  Files         283      283               
  Lines       60515    59626      -889     
  Branches     7542     7360      -182     
===========================================
- Hits        54284    15572    -38712     
- Misses       4074    42167    +38093     
+ Partials     2157     1887      -270     

@codeforall codeforall marked this pull request as ready for review December 26, 2024 08:23
@onurctirtir
Copy link
Member

onurctirtir commented Dec 26, 2024

Based on the discussion that we had with @codeforall, #7696 also looks like caused by the very same bug, see #7698 (comment).

So, rather than changing the way we decide which relation to recursively plan in "local / distributed table join" path, Muhammad will spend some time on #7698 (comment) to understand the actual root cause behind those three issues and I'll help him.

Then, we might want to re-visit the initial idea proposed in this PR as a separate optimization / improvement for "local / distributed table join" planner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants