-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: FULL OUTER JOIN and LIMIT produces wrong results #14338
Conversation
cc @alamb @Dandandan Thanks a lot! And also added the test cases in #14336 It shows right result now. |
I am confident now that we can remove the pushdown limit for full outer join, because i also checked the full outer join pushdown filter, we will not push down for full outer join for both side, code reference: // Get pushable predicates from current optimizer state
let (left_preserved, right_preserved) = lr_is_preserved(join.join_type); pub(crate) fn lr_is_preserved(join_type: JoinType) -> (bool, bool) {
match join_type {
JoinType::Inner => (true, true),
JoinType::Left => (true, false),
JoinType::Right => (false, true),
JoinType::Full => (false, false),
// No columns from the right side of the join can be referenced in output
// predicates for semi/anti joins, so whether we specify t/f doesn't matter.
JoinType::LeftSemi | JoinType::LeftAnti | JoinType::LeftMark => (true, false),
// No columns from the left side of the join can be referenced in output
// predicates for semi/anti joins, so whether we specify t/f doesn't matter.
JoinType::RightSemi | JoinType::RightAnti => (false, true),
}
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pushing down limit to full join will introduce the issue of correctness, we should avoid it.
# there should be no nulls | ||
# Reproducer for https://github.com/apache/datafusion/issues/14335 | ||
query II | ||
select * from t1 FULL JOIN t2 ON t1.a = t2.b LIMIT 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test will be flaky. (Maybe explain is enough)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in theory the results from this query is undefined (in the sense that any two rows would be valid)
However, given we are limiting to a single core via
set datafusion.execution.target_partitions = 1;
I think this won't be flaky in practice
Of course I am somewhat biased as I wrote these tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, make sense, I didn't notice the setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @zhuqi-lucas and @xudong963
# there should be no nulls | ||
# Reproducer for https://github.com/apache/datafusion/issues/14335 | ||
query II | ||
select * from t1 FULL JOIN t2 ON t1.a = t2.b LIMIT 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in theory the results from this query is undefined (in the sense that any two rows would be valid)
However, given we are limiting to a single core via
set datafusion.execution.target_partitions = 1;
I think this won't be flaky in practice
Of course I am somewhat biased as I wrote these tests
Let's go |
Thank you @alamb and @xudong963! |
Which issue does this PR close?
FULL OUTER JOIN
andLIMIT
produces wrong results #14335Rationale for this change
LIMITs are incorrectly pushed through FULL OUTER Joins
What changes are included in this PR?
disable full outer join limit push down
Are these changes tested?
Added unit test, also the original unit test show right behaviour now.
Are there any user-facing changes?
Yes, FULL OUTER JOIN and LIMIT produces right results now.