-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(rust, python): Keep min/max and arg_min/arg_max consistent. #10716
Conversation
You can take a look at the changed |
ca.into_iter() | ||
.position(|opt_val| matches!(opt_val, Some(true))) | ||
.enumerate() | ||
.find_map(|(idx, val)| match val { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic is: we first find true
, and if it does not exist, we return the first false
location. Is this exactly what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reswqa Yes, that's the correct behavior, although I think we can do a faster implementation more explicitly iterating over the bitmap. But that doesn't have to be in this PR.
match ca.is_sorted_flag() { | ||
IsSorted::Ascending => Some(0), | ||
IsSorted::Descending => Some(ca.len() - 1), | ||
IsSorted::Ascending => ca.first_non_null(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can test ca.null_count() == 0
, and if it doesn't hold up, we can go here. Otherwise, we will still follow the previous logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would matter much in performance. I leave this up to you.
IsSorted::Not => ca | ||
.into_iter() | ||
.enumerate() | ||
.flat_map(|(idx, val)| val.map(|val| (idx, val))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None
will always be considered the minimum value as None < Some(_)
. But I'm not quite sure if it's better to do flat_map
here or do match
directly in reduce
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Match in reduce will be faster as we have less indirection. I think we should first loop over downcast_iter
and then loop over the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Match in reduce will be faster as we have less indirection.
Make sense, will rewrite this.
I think we should first loop over downcast_iter and then loop over the array.
Does this means rewriting it to the same pattern of arg_max_numeric
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are similar. We could even share a same generic if you are up for that. ^^
But that is maybe a nice follow up PR. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to do this refactoring: Let them both share a same generic function as they are almost similar. The only difference is that for numerical types, we have a fast path with the following bound:
for<'b> &'b [T::Native]: ArgMinMax, |
But for Utf8ChunkedArray
: we do not have this bound, also we doesn't have this branch:
polars/crates/polars-ops/src/series/ops/arg_min_max.rs
Lines 264 to 268 in ecb819a
} else { | |
// When no nulls & array not empty => we can use fast argminmax | |
let min_idx: usize = arr.values().as_slice().argmin(); | |
Some((min_idx, arr.value(min_idx))) | |
}; |
I can think of some hack solution (such as a magical macro), but I don't really like that way. Do we have a good solution for this that looks simpler and cleaner 🤔.
match ca.is_sorted_flag() { | ||
IsSorted::Ascending => Some(0), | ||
IsSorted::Descending => Some(ca.len() - 1), | ||
IsSorted::Ascending => ca.first_non_null(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would matter much in performance. I leave this up to you.
IsSorted::Not => ca | ||
.into_iter() | ||
.enumerate() | ||
.flat_map(|(idx, val)| val.map(|val| (idx, val))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are similar. We could even share a same generic if you are up for that. ^^
But that is maybe a nice follow up PR. :)
Thanks @ritchie46 and @orlp for the Patient review!
Agree with this, we can merge this first to ensure that it works correctly. The optimization suggestions you have put forward are very helpful, and I will try opening a new |
🚀 Thanks! |
This based on #10708 and fixes #10707.
Some tests may fail because we changed
arg_min
andarg_max
behavior. I will fix that cases one by one.