-
Notifications
You must be signed in to change notification settings - Fork 684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Fix] Incorrect Query Results with Distributed Query Plans. citusdata#7698 and citusdata#7697 #7810
base: main
Are you sure you want to change the base?
Conversation
This PR addresses two reported issues: citusdata#7698 and citusdata#7697. The Problem: Both issues involve incorrect query results when a query references both local and distributed tables, and includes a WHERE EXISTS clause on the local table. For example: SELECT ... WHERE EXISTS (SELECT * FROM local_table); In such cases, the WHERE EXISTS clause typically generates an InitPlan or "one-time filter," which determines whether the rest of the plan's output qualifies for the result. If this InitPlan relies on the contents of a local table, it must be executed locally on the coordinator. However, the planner's decisions regarding whether to convert local or distributed tables into intermediate results fail to account for the references within the InitPlan. This results in an incorrect query execution plan and, subsequently, incorrect data. The Fix: This PR ensures that when the standard planner (standard_planner) generates an InitPlan in the PlannedStmt, we check the executor parameters (PARAM nodes) in the join qualifiers for relations referenced by the InitPlan. If such references exist, distributed table references are converted to intermediate results rather than local tables. This adjustment ensures that local tables used in the InitPlan remain intact and behave as expected. This fix prevents incorrect query results in cases involving mixed local and distributed tables with WHERE EXISTS clauses and improves the accuracy of distributed query planning.
@microsoft-github-policy-service agree company="Microsoft" |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7810 +/- ##
===========================================
- Coverage 89.70% 26.11% -63.59%
===========================================
Files 283 283
Lines 60515 59626 -889
Branches 7542 7360 -182
===========================================
- Hits 54284 15572 -38712
- Misses 4074 42167 +38093
+ Partials 2157 1887 -270 |
792f59b
to
a89ab7e
Compare
Based on the discussion that we had with @codeforall, #7696 also looks like caused by the very same bug, see #7698 (comment). So, rather than changing the way we decide which relation to recursively plan in "local / distributed table join" path, Muhammad will spend some time on #7698 (comment) to understand the actual root cause behind those three issues and I'll help him. Then, we might want to re-visit the initial idea proposed in this PR as a separate optimization / improvement for "local / distributed table join" planner. |
This PR addresses two reported issues: #7698 and #7697.
Both issues involve incorrect query results when a query references both local and distributed tables, and includes a WHERE EXISTS clause on the local table.
For example:
SELECT ...
WHERE EXISTS (SELECT * FROM local_table);
In such cases, the WHERE EXISTS clause typically generates an InitPlan or "one-time filter," which determines whether the rest of the plan's output qualifies for the result. If this InitPlan relies on the contents of a local table, it must be executed locally on the coordinator. However, the planner's decisions regarding whether to convert local or distributed tables into intermediate results fail to account for the references within the InitPlan. This results in an incorrect query execution plan and, subsequently, incorrect data.
This PR ensures that when the standard planner (standard_planner) generates an InitPlan in the PlannedStmt, we check the executor parameters (PARAM nodes) in the join qualifiers for relations referenced by the InitPlan. If such references exist, distributed table references are converted to intermediate results rather than local tables. This adjustment ensures that local tables used in the InitPlan remain intact and behave as expected.
This fix prevents incorrect query results in cases involving mixed local and distributed tables with WHERE EXISTS clauses and improves the accuracy of distributed query planning.