Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore usage of filtered SUM #17378

Merged
merged 27 commits into from
Dec 12, 2024
Merged

Conversation

kgyrtkirk
Copy link
Member

Recent upgrade of calcite have brought a change which have fixed a minor correctness issue by not allowing the rewrite of SUM(CASE WHEN COND THEN C ELSE 0 END) to SUM(C) FILTER(COND)

This patch restores the old behaviour as the nested case_searched the above may retain in the plans could have serious performance impacts.

{
public DruidAggregateCaseToFilterRule()
{
super(operand(Aggregate.class, operand(Project.class, any())));

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
CalciteTestBase.makeColumnExpression
should be avoided because it has been deprecated.
{
public DruidAggregateCaseToFilterRule()
{
super(operand(Aggregate.class, operand(Project.class, any())));

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
CalciteTestBase.makeExpression
should be avoided because it has been deprecated.
{
public DruidAggregateCaseToFilterRule()
{
super(operand(Aggregate.class, operand(Project.class, any())));

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
CalciteTestBase.makeColumnExpression
should be avoided because it has been deprecated.
&& value.equals(((RexLiteral) rexNode).getValueAs(BigDecimal.class));
}

private static class RewriteShuttle extends RexShuttle

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
CalciteTestBase.makeColumnExpression
should be avoided because it has been deprecated.
* SUM(CASE WHEN COND THEN COL1 ELSE 0 END)
* without introducing an inner case this should be removed.
*/
public class DruidAggregateCaseToFilterRule extends RelOptRule implements SubstitutionRule
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this rule create a null-handling bug? I thought that was why we stopped doing the transformation. If so I wonder if there's some other way we can fix the performance regression.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - it kinda does that

I was exploring to do it differently - one way is to do a more complicated rewrite:
did made it to be able to rewrite it back to a CASE COUNT() = 0 THEN NULL ELSE COALESCE(SUM() FILTER (F), 0) END ; not sure about that path - it really makes it more complicated...
I'm a bit skeptical about its long term pros/cons; we can have it - I have a version about it here:
https://github.com/kgyrtkirk/calcite/pull/7/files
If there would be more such rewrites it might possibly worth it - but right now I don't thin there will be more

the interesting cases for a SUM(CASE WHEN filter THEN valueCol ELSE 0 END) are:

original expression filtered-SUM filtered-SUM0
empty table null null 0
filtered row 0 null 0
selected; valueCol is null null null 0
selected; valueCol is 1 1 1 1

I wonder if we could possibly tweak the filtering to return zero in case there are filtered rows - or restore the null-handling bug

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was exploring the how to handle this with a rewrite that utilizes a customized filteredaggregator which could fix the all rows filtered case during the calcite/druid translation
but that would need things to be more bent around the filteredaggregator things...

I've stepped back and added a context flag which could restore the compliant behaviour and made the more performant the default - if someone wants to have that 0 then this could be turned off.

import java.util.List;

/**
* Druid version of {@link AggregateCaseToFilterRule}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this rule completely cover the functionality of AggregateCaseToFilterRule? If so, can we stop running AggregateCaseToFilterRule? i.e., by removing CoreRules.AGGREGATE_CASE_TO_FILTER from CalciteRulesManager.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no - it just provides some extra logic
I think this should be removed when the upstream one could cover this case as well

{
public DruidAggregateCaseToFilterRule()
{
super(operand(Aggregate.class, operand(Project.class, any())));

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
CalciteTestBase.makeColumnExpression
should be avoided because it has been deprecated.
{
public DruidAggregateCaseToFilterRule()
{
super(operand(Aggregate.class, operand(Project.class, any())));

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
CalciteTestBase.makeColumnExpression
should be avoided because it has been deprecated.
{
public DruidAggregateCaseToFilterRule()
{
super(operand(Aggregate.class, operand(Project.class, any())));

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
CalciteTestBase.makeColumnExpression
should be avoided because it has been deprecated.
* | >0 | N>0 | 1 | N | N |
* +-----------------+--------------+----------+------+--------------+
*
* The only inconsistency this causes is when then condition matches all rows; and for all o
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete javadoc?

@@ -88,6 +88,8 @@ public class QueryContexts
public static final String UNCOVERED_INTERVALS_LIMIT_KEY = "uncoveredIntervalsLimit";
public static final String MIN_TOP_N_THRESHOLD = "minTopNThreshold";
public static final String CATALOG_VALIDATION_ENABLED = "catalogValidationEnabled";
public static final String EXTENDED_FILTERED_SUM_REWRITE_ENABLED = "extendedFilteredSumRewrite";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the new context parameter in this PR is undocumented and defaults to true (the old behavior, more performant, yet incorrect in the edge cases). IMO it's OK to have it be undocumented, but we should have a javadoc here explaining what is going on. Can you please add one and link it both to the rule (using @link) and also says something like: this is here for testing purposes, it is a known issue that the production configuration is incorrect in certain edge cases, and this context parameter can be removed once we have a way to do a rewrite of these case statements that is both high-performance and correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added apidoc-s for this field to explain its purpose - I've also updated the docs for the class doing the rewrite.


public DruidAggregateCaseToFilterRule(boolean extendedFilteredSumRewrite)
{
super(operand(Aggregate.class, operand(Project.class, any())));

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
RelOptRule.operand
should be avoided because it has been deprecated.

public DruidAggregateCaseToFilterRule(boolean extendedFilteredSumRewrite)
{
super(operand(Aggregate.class, operand(Project.class, any())));

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
RelOptRule.operand
should be avoided because it has been deprecated.

public DruidAggregateCaseToFilterRule(boolean extendedFilteredSumRewrite)
{
super(operand(Aggregate.class, operand(Project.class, any())));

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
RelOptRule.any
should be avoided because it has been deprecated.

public DruidAggregateCaseToFilterRule(boolean extendedFilteredSumRewrite)
{
super(operand(Aggregate.class, operand(Project.class, any())));

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [RelOptRule.operand](1) should be avoided because it has been deprecated.

public DruidAggregateCaseToFilterRule(boolean extendedFilteredSumRewrite)
{
super(operand(Aggregate.class, operand(Project.class, any())));

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [RelOptRule.operand](1) should be avoided because it has been deprecated.

public DruidAggregateCaseToFilterRule(boolean extendedFilteredSumRewrite)
{
super(operand(Aggregate.class, operand(Project.class, any())));

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation

Invoking [RelOptRule.any](1) should be avoided because it has been deprecated.
@kgyrtkirk kgyrtkirk merged commit 1a38434 into apache:master Dec 12, 2024
82 checks passed
@adarshsanjeev adarshsanjeev added this to the 32.0.0 milestone Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants