-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable null handling by default for leaf stages in the multi-stage query engine #13570
base: master
Are you sure you want to change the base?
Enable null handling by default for leaf stages in the multi-stage query engine #13570
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #13570 +/- ##
============================================
+ Coverage 61.75% 62.09% +0.34%
+ Complexity 207 198 -9
============================================
Files 2436 2558 +122
Lines 133233 140923 +7690
Branches 20636 21867 +1231
============================================
+ Hits 82274 87511 +5237
- Misses 44911 46790 +1879
- Partials 6048 6622 +574
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
8318c4b
to
37a1075
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the right thing to do, even if the performance may be impacted a bit
@Jackie-Jiang what do you think? |
Though semantically this is the right thing to do, this will change the behavior of existing queries. Can we clearly define what our long term approach for this is with the Multistage Engine? One strategy could be that since we still have a way to stick with the existing behavior by setting the query option to false, this is a valid change for a minor release and users should refer to the release-notes for caveats before upgrading their Pinot version. Also, is there a Wiki page which describes what the current null handling behavior is? I think this one doesn't cover the Multistage Engine quirks completely: https://docs.pinot.apache.org/developers/advanced/null-value-support#examples-queries I am also happy to contribute to the Wiki and/or the policy for query behavior changes. |
I think the long term goal is to make the v2 query engine SQL compliant and this is one small step towards that. I agree that we should ideally avoid making such default behavior changes and hopefully future releases will have no such surprises once we achieve full compliance.
Yeah, I think that and also the fact that this patch is targeting only the v2 multi-stage query engine (which is still fairly new and not as widely adopted) makes the change justifiable.
For aggregations specifically, I was planning to update this page - https://docs.pinot.apache.org/users/user-guide-query/query-syntax/supported-aggregations if this PR is merged. I agree that we should also update https://docs.pinot.apache.org/developers/advanced/null-value-support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the correct behavior, but I'm not sure about the performance regression it might introduce. It would be safer if we can verify the performance with the test queries.
Ideally we only want to enable null handling when there are nullable
fields. Do you think that is doable?
I don't think that is ideal. In fact cases like this say otherwise. Even if fields are not nullable, a |
MAX
aggregate function usesDouble.NEGATIVE_INFINITY
as its default value when null handling is not enabled. The SQL standard states that if no row qualifies, the result of any aggregate function (other thanCOUNT
) is thenull
value.null
if no doc is matched. This is due to the use of an object based aggregation result holder rather than a primitive based one, for instance here -pinot/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/MaxAggregationFunction.java
Lines 57 to 60 in efa4300