-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass scalar to eq
inside nullif
#11697
Conversation
eq
inside nullif
In the first commmits of this PR, I saw a 4x perf increase, but CI pointed out that a test failed. It turned out the benchmark returned Error, which obviously was faster than doing any computation. The error was So in the end this PR might not give not benefit - feel free to close it if you want. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @simonvandel but if arrow-rs doesn't support, we should have a negative test case that fails?
let lhs = lhs.to_array_of_size(rhs.len())?; | ||
let array = nullif(&lhs, &eq(&lhs, &rhs)?)?; | ||
let lhs_s = lhs.to_scalar()?; | ||
let lhs_a = lhs.to_array_of_size(rhs.len())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could update the arrow-rs nullif kernel to have a special case for a constant (arrow-rs calls this idea "Datum" rather than ScalarValue):
https://docs.rs/arrow/latest/arrow/array/trait.Datum.html
But you can get the Datum
like this:https://docs.rs/datafusion/latest/datafusion/common/enum.ScalarValue.html#method.to_scalar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specializing arrow-rs's nullif kernel for a constant would be the absolute best. But if others can reproduce the small % perf increase of this PR, then perhaps it can be merged in isolation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good improvement regardless as it makes the DataFusion code follow the pattern so it can take advantage of the Datum
special case if/when it is implemented.
Any chance you have a moment to file a ticket upstream in arrow-rs? If not, I will do so
Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look |
Co-authored-by: Oleks V <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @simonvandel and @comphead -- I think this is an improvement even if there are no measurable performance gains yet.
let lhs = lhs.to_array_of_size(rhs.len())?; | ||
let array = nullif(&lhs, &eq(&lhs, &rhs)?)?; | ||
let lhs_s = lhs.to_scalar()?; | ||
let lhs_a = lhs.to_array_of_size(rhs.len())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good improvement regardless as it makes the DataFusion code follow the pattern so it can take advantage of the Datum
special case if/when it is implemented.
Any chance you have a moment to file a ticket upstream in arrow-rs? If not, I will do so
I created apache/arrow-rs#6193 |
Thanks again @simonvandel and @comphead |
Which issue does this PR close?
Closes #.
Rationale for this change
eq
used inside thenullif
has a performance specialization for scalar, but it was not used, as we never passed aScalar
into it.What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?