-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore usage of filtered SUM #17378
Conversation
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidAggregateCaseToFilterRule.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidAggregateCaseToFilterRule.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidAggregateCaseToFilterRule.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidAggregateCaseToFilterRule.java
Fixed
Show fixed
Hide fixed
{ | ||
public DruidAggregateCaseToFilterRule() | ||
{ | ||
super(operand(Aggregate.class, operand(Project.class, any()))); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note test
CalciteTestBase.makeColumnExpression
{ | ||
public DruidAggregateCaseToFilterRule() | ||
{ | ||
super(operand(Aggregate.class, operand(Project.class, any()))); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note test
CalciteTestBase.makeExpression
{ | ||
public DruidAggregateCaseToFilterRule() | ||
{ | ||
super(operand(Aggregate.class, operand(Project.class, any()))); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note test
CalciteTestBase.makeColumnExpression
&& value.equals(((RexLiteral) rexNode).getValueAs(BigDecimal.class)); | ||
} | ||
|
||
private static class RewriteShuttle extends RexShuttle |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note test
CalciteTestBase.makeColumnExpression
* SUM(CASE WHEN COND THEN COL1 ELSE 0 END) | ||
* without introducing an inner case this should be removed. | ||
*/ | ||
public class DruidAggregateCaseToFilterRule extends RelOptRule implements SubstitutionRule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this rule create a null-handling bug? I thought that was why we stopped doing the transformation. If so I wonder if there's some other way we can fix the performance regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - it kinda does that
I was exploring to do it differently - one way is to do a more complicated rewrite:
did made it to be able to rewrite it back to a CASE COUNT() = 0 THEN NULL ELSE COALESCE(SUM() FILTER (F), 0) END
; not sure about that path - it really makes it more complicated...
I'm a bit skeptical about its long term pros/cons; we can have it - I have a version about it here:
https://github.com/kgyrtkirk/calcite/pull/7/files
If there would be more such rewrites it might possibly worth it - but right now I don't thin there will be more
the interesting cases for a SUM(CASE WHEN filter THEN valueCol ELSE 0 END)
are:
original expression | filtered-SUM | filtered-SUM0 | |
---|---|---|---|
empty table | null | null | 0 |
filtered row | 0 | null | 0 |
selected; valueCol is null | null | null | 0 |
selected; valueCol is 1 | 1 | 1 | 1 |
I wonder if we could possibly tweak the filtering to return zero in case there are filtered rows - or restore the null-handling bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was exploring the how to handle this with a rewrite that utilizes a customized filteredaggregator which could fix the all rows filtered
case during the calcite/druid translation
but that would need things to be more bent around the filteredaggregator things...
I've stepped back and added a context flag which could restore the compliant behaviour and made the more performant the default - if someone wants to have that 0
then this could be turned off.
import java.util.List; | ||
|
||
/** | ||
* Druid version of {@link AggregateCaseToFilterRule} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this rule completely cover the functionality of AggregateCaseToFilterRule
? If so, can we stop running AggregateCaseToFilterRule
? i.e., by removing CoreRules.AGGREGATE_CASE_TO_FILTER
from CalciteRulesManager
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no - it just provides some extra logic
I think this should be removed when the upstream one could cover this case as well
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidAggregateCaseToFilterRule.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidAggregateCaseToFilterRule.java
Fixed
Show fixed
Hide fixed
sql/src/main/java/org/apache/druid/sql/calcite/rule/DruidAggregateCaseToFilterRule.java
Fixed
Show fixed
Hide fixed
{ | ||
public DruidAggregateCaseToFilterRule() | ||
{ | ||
super(operand(Aggregate.class, operand(Project.class, any()))); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note test
CalciteTestBase.makeColumnExpression
{ | ||
public DruidAggregateCaseToFilterRule() | ||
{ | ||
super(operand(Aggregate.class, operand(Project.class, any()))); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note test
CalciteTestBase.makeColumnExpression
{ | ||
public DruidAggregateCaseToFilterRule() | ||
{ | ||
super(operand(Aggregate.class, operand(Project.class, any()))); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note test
CalciteTestBase.makeColumnExpression
* | >0 | N>0 | 1 | N | N | | ||
* +-----------------+--------------+----------+------+--------------+ | ||
* | ||
* The only inconsistency this causes is when then condition matches all rows; and for all o |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete javadoc?
@@ -88,6 +88,8 @@ public class QueryContexts | |||
public static final String UNCOVERED_INTERVALS_LIMIT_KEY = "uncoveredIntervalsLimit"; | |||
public static final String MIN_TOP_N_THRESHOLD = "minTopNThreshold"; | |||
public static final String CATALOG_VALIDATION_ENABLED = "catalogValidationEnabled"; | |||
public static final String EXTENDED_FILTERED_SUM_REWRITE_ENABLED = "extendedFilteredSumRewrite"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the new context parameter in this PR is undocumented and defaults to true
(the old behavior, more performant, yet incorrect in the edge cases). IMO it's OK to have it be undocumented, but we should have a javadoc here explaining what is going on. Can you please add one and link it both to the rule (using @link
) and also says something like: this is here for testing purposes, it is a known issue that the production configuration is incorrect in certain edge cases, and this context parameter can be removed once we have a way to do a rewrite of these case statements that is both high-performance and correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added apidoc-s for this field to explain its purpose - I've also updated the docs for the class doing the rewrite.
|
||
public DruidAggregateCaseToFilterRule(boolean extendedFilteredSumRewrite) | ||
{ | ||
super(operand(Aggregate.class, operand(Project.class, any()))); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note
RelOptRule.operand
|
||
public DruidAggregateCaseToFilterRule(boolean extendedFilteredSumRewrite) | ||
{ | ||
super(operand(Aggregate.class, operand(Project.class, any()))); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note
RelOptRule.operand
|
||
public DruidAggregateCaseToFilterRule(boolean extendedFilteredSumRewrite) | ||
{ | ||
super(operand(Aggregate.class, operand(Project.class, any()))); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note
RelOptRule.any
|
||
public DruidAggregateCaseToFilterRule(boolean extendedFilteredSumRewrite) | ||
{ | ||
super(operand(Aggregate.class, operand(Project.class, any()))); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
|
||
public DruidAggregateCaseToFilterRule(boolean extendedFilteredSumRewrite) | ||
{ | ||
super(operand(Aggregate.class, operand(Project.class, any()))); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
|
||
public DruidAggregateCaseToFilterRule(boolean extendedFilteredSumRewrite) | ||
{ | ||
super(operand(Aggregate.class, operand(Project.class, any()))); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation
Recent upgrade of calcite have brought a change which have fixed a minor correctness issue by not allowing the rewrite of
SUM(CASE WHEN COND THEN C ELSE 0 END)
toSUM(C) FILTER(COND)
This patch restores the old behaviour as the nested
case_searched
the above may retain in the plans could have serious performance impacts.