Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable null handling by default for leaf stages in the multi-stage query engine #13570

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yashmayya
Copy link
Collaborator

  • Without null handling enabled in the leaf stages, we get results that don't adhere to the SQL standard for various aggregations. For instance, the MAX aggregate function uses Double.NEGATIVE_INFINITY as its default value when null handling is not enabled. The SQL standard states that if no row qualifies, the result of any aggregate function (other than COUNT) is the null value.
  • With null handling enabled, the various aggregation operators do return null if no doc is matched. This is due to the use of an object based aggregation result holder rather than a primitive based one, for instance here -
    if (_nullHandlingEnabled) {
    return new ObjectAggregationResultHolder();
    }
    return new DoubleAggregationResultHolder(DEFAULT_INITIAL_VALUE);
  • This patch enables null handling by default for leaf stages in the multi-stage query engine. Note that users can still explicitly override this to disable null handling if desired.

@yashmayya yashmayya added multi-stage Related to the multi-stage query engine null support labels Jul 9, 2024
@codecov-commenter
Copy link

codecov-commenter commented Jul 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.09%. Comparing base (59551e4) to head (37a1075).
Report is 731 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13570      +/-   ##
============================================
+ Coverage     61.75%   62.09%   +0.34%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2558     +122     
  Lines        133233   140923    +7690     
  Branches      20636    21867    +1231     
============================================
+ Hits          82274    87511    +5237     
- Misses        44911    46790    +1879     
- Partials       6048     6622     +574     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.06% <100.00%> (+0.35%) ⬆️
java-21 61.98% <100.00%> (+0.36%) ⬆️
skip-bytebuffers-false 62.08% <100.00%> (+0.34%) ⬆️
skip-bytebuffers-true 61.95% <100.00%> (+34.23%) ⬆️
temurin 62.09% <100.00%> (+0.34%) ⬆️
unittests 62.09% <100.00%> (+0.34%) ⬆️
unittests1 46.66% <100.00%> (-0.23%) ⬇️
unittests2 27.66% <0.00%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yashmayya yashmayya force-pushed the v2-enable-null-handling-leaf-stage branch from 8318c4b to 37a1075 Compare July 9, 2024 13:49
@yashmayya yashmayya marked this pull request as ready for review July 9, 2024 13:56
@ankitsultana ankitsultana added the backward-incompat Referenced by PRs that introduce or fix backward compat issues label Jul 9, 2024
Copy link
Contributor

@gortiz gortiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right thing to do, even if the performance may be impacted a bit

@gortiz
Copy link
Contributor

gortiz commented Jul 9, 2024

@Jackie-Jiang what do you think?

@ankitsultana ankitsultana added the release-notes Referenced by PRs that need attention when compiling the next release notes label Jul 9, 2024
@ankitsultana
Copy link
Contributor

Though semantically this is the right thing to do, this will change the behavior of existing queries.

Can we clearly define what our long term approach for this is with the Multistage Engine? One strategy could be that since we still have a way to stick with the existing behavior by setting the query option to false, this is a valid change for a minor release and users should refer to the release-notes for caveats before upgrading their Pinot version.

Also, is there a Wiki page which describes what the current null handling behavior is? I think this one doesn't cover the Multistage Engine quirks completely: https://docs.pinot.apache.org/developers/advanced/null-value-support#examples-queries

I am also happy to contribute to the Wiki and/or the policy for query behavior changes.

@yashmayya
Copy link
Collaborator Author

Can we clearly define what our long term approach for this is with the Multistage Engine

I think the long term goal is to make the v2 query engine SQL compliant and this is one small step towards that. I agree that we should ideally avoid making such default behavior changes and hopefully future releases will have no such surprises once we achieve full compliance.

One strategy could be that since we still have a way to stick with the existing behavior by setting the query option to false, this is a valid change for a minor release and users should refer to the release-notes for caveats before upgrading their Pinot version.

Yeah, I think that and also the fact that this patch is targeting only the v2 multi-stage query engine (which is still fairly new and not as widely adopted) makes the change justifiable.

Also, is there a Wiki page which describes what the current null handling behavior is? I think this one doesn't cover the Multistage Engine quirks completely

For aggregations specifically, I was planning to update this page - https://docs.pinot.apache.org/users/user-guide-query/query-syntax/supported-aggregations if this PR is merged. I agree that we should also update https://docs.pinot.apache.org/developers/advanced/null-value-support.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the correct behavior, but I'm not sure about the performance regression it might introduce. It would be safer if we can verify the performance with the test queries.
Ideally we only want to enable null handling when there are nullable fields. Do you think that is doable?

@gortiz
Copy link
Contributor

gortiz commented Jul 10, 2024

Ideally we only want to enable null handling when there are nullable fields. Do you think that is doable?

I don't think that is ideal. In fact cases like this say otherwise. Even if fields are not nullable, a max(notNullableField) where 1 = 0 should return null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backward-incompat Referenced by PRs that introduce or fix backward compat issues multi-stage Related to the multi-stage query engine null support release-notes Referenced by PRs that need attention when compiling the next release notes SQL Compliance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants