Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Expressify str.strip_prefix & suffix #11197

Merged
merged 4 commits into from
Sep 21, 2023
Merged

Conversation

reswqa
Copy link
Collaborator

@reswqa reswqa commented Sep 19, 2023

No description provided.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Sep 19, 2023
@@ -17,8 +20,8 @@ where
T: PolarsDataType,
U: PolarsDataType,
V: PolarsDataType,
F: for<'a> FnMut(Option<T::Physical<'a>>, Option<U::Physical<'a>>) -> Option<K>,
V::Array: ArrayFromIter<Option<K>>,
F: for<'a> Helper<Option<T::Physical<'a>>, Option<U::Physical<'a>>, Option<V::Physical<'a>>>,
Copy link
Collaborator Author

@reswqa reswqa Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I feel uncomfortable with this approach(Helper is for HRTB of the output type of FnMut). But I didn't come up with a good solution to the lifetime issue here in a short period of time. I guess there may be a very simple way, perhaps I overestimated this 😞 Any input or suggestion about this will be helpful to me. :)

Next comment has specific error message.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definition shouldn't merge it like this with this helper.

Copy link
Collaborator Author

@reswqa reswqa Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definition shouldn't merge it like this with this helper.

Yes, I also oppose this approach. But It's just for the convenience of bringing up the problem and seeing if there are any good ways to handle this. Or rather, it's throwing bricks to attract jade. :)

I know we have some way to bypass this issue, but I would rather make binary_elementwise happy with this, especially in the case of closures. If there is really no good solution, I will revert this part of the changes and change it to a way that does not require modifying this function.

Comment on lines 346 to 353
_ => binary_elementwise(
ca,
prefix,
|opt_s: Option<&str>, opt_prefix: Option<&str>| match (opt_s, opt_prefix) {
(Some(s), Some(prefix)) => Some(s.strip_prefix(prefix).unwrap_or(s)),
_ => None,
},
),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the log make the compiler unhappy.

error: lifetime may not live long enough
   --> /Users/reswqa/code/rust/polars/crates/polars-ops/src/chunked_array/strings/namespace.rs:350:48
    |
349 |                 |opt_s: Option<&str>, opt_prefix: Option<&str>| match (opt_s, opt_prefix) {
    |                                -                              - return type of closure is std::option::Option<&'2 str>
    |                                |
    |                                let's call the lifetime of this reference `'1`
350 |                     (Some(s), Some(prefix)) => Some(s.strip_prefix(prefix).unwrap_or(s)),
    |                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ returning this value requires that `'1` must outlive `'2`

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this closure an actual function and annotate the lifetimes?

Copy link
Collaborator Author

@reswqa reswqa Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you means some thing like this?

fn strip_prefix_op<'a>(opt_s: Option<&'a str>, opt_prefix: Option<&'a str>) -> Option<& 'a str> {
        match (opt_s, opt_prefix) {
            (Some(s), Some(prefix)) => Some(s.strip_prefix(prefix).unwrap_or(s)),
            _ => None,
        }
}

and rewrite to

binary_elementwise(ca, prefix,strip_prefix_op)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this given me:

error[E0308]: mismatched types
   --> /Users/reswqa/code/rust/polars/crates/polars-ops/src/chunked_array/strings/namespace.rs:359:18
    |
359 |               _ => binary_elementwise(
    |  __________________^
360 | |                 ca,
361 | |                 prefix,
362 | |                 strip_prefix_op
363 | |             ),
    | |_____________^ one type is more general than the other
    |
    = note: expected enum `std::option::Option<&'a str>`
               found enum `std::option::Option<&str>`
note: the lifetime requirement is introduced here
   --> /Users/reswqa/code/rust/polars/crates/polars-core/src/chunked_array/ops/arity.rs:20:75
    |
20  |     F: for<'a> FnMut(Option<T::Physical<'a>>, Option<U::Physical<'a>>) -> Option<K>,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look tomorrow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@reswqa reswqa marked this pull request as ready for review September 19, 2023 17:43
@orlp
Copy link
Collaborator

orlp commented Sep 20, 2023

@reswqa Well, that was a deep dive into HRTBs and nasty stuff. Seems like I underestimated the problem. Nevertheless, I still wanted to keep the function generic, so could you look at and pull from the strip branch of my repo? https://github.com/orlp/polars/tree/strip

@reswqa
Copy link
Collaborator Author

reswqa commented Sep 20, 2023

@orlp Thank you very much for taking the time to dive into this issue. I have looked at the code on your branch and it is much better than the initial Helper I introduced in this PR.

Nevertheless, I still wanted to keep the function generic, so could you look at and pull from the strip branch of my repo?

Yes, I will pick that to this PR. 👍

ps: This problem indeed filled with the magical smell of HRTB :)

@reswqa
Copy link
Collaborator Author

reswqa commented Sep 20, 2023

@orlp I'm a bit lost and don't quite understand why the un-compiled case in CI cannot correctly auto-inference types(forgive me for not being familiar with type gymnastics)... If we need to introduce the helper method like infer_re_match many times, I feel that the benefits of this may not very significant, but rather torment other developers with similar problems in the future. WDYT?🤔

@orlp
Copy link
Collaborator

orlp commented Sep 20, 2023

@reswqa It only gets inferred wrong if there are two different lifetimes in the input. I'm not happy with it either but I don't know how often that will occur. If you know a better solution I'm all ears.

@reswqa
Copy link
Collaborator Author

reswqa commented Sep 20, 2023

@orlp, To be frank, if we want to keep the function more generic, I absolutely cannot come up with a better solution than what you have provided. From my perspective, both of the following options are good:

  • Similar to the solution I provided earlier, but this does result in the loss of some generics. I'm not sure how much negative impact this will have.
  • Keeping the current solution, I suppose there may not be many cases where two inputs have different lifetimes. But I may need some help from you to fix the errors in CI that exist in borrowed.rs. :)

@orlp
Copy link
Collaborator

orlp commented Sep 20, 2023

@reswqa Regarding those errors, that's just that it can't figure out what it should collect into since it immediately pipes into into_series. The old code tried to fix that by passing the return type in the signature, what you can do now is this:

let arr: Float64Chunked = arity::binary_elementwise(lhs, rhs, |opt_l, opt_r| match (
    opt_l, opt_r,
) {
    (Some(l), Some(r)) => {
        if r.is_zero() {
            None
        } else {
            Some(l / r)
        }
    },
    _ => None,
});
Ok(arr.into_series())

@reswqa
Copy link
Collaborator Author

reswqa commented Sep 21, 2023

@orlp Thanks, it works and CI has turned green. Finally, we can do the next round review now.

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest looks great! Just for my understanding which lifetimes were problematic in the old implementation?

@ritchie46 ritchie46 merged commit 431f85f into pola-rs:main Sep 21, 2023
25 checks passed
@reswqa
Copy link
Collaborator Author

reswqa commented Sep 21, 2023

Just for my understanding which lifetimes were problematic in the old implementation?

Assuming the old implementation you mentioned is the code before this PR. Detailed error log per this comment thread: #11197 (comment). It seems that the old implementation unable to establish the lifetimes relationship between the input and output of the closure. Annotate the lifetime for return type of FnMut is not well supported by HRTB, so it looks a bit hack here...

@orlp
Copy link
Collaborator

orlp commented Sep 21, 2023

@ritchie46 The old code specified the return type of the closure as a parameter R on binary_elementwise. So then we had (simplifying a bit to explain the problem more clearly):

F: for<'a> FnMut(&'a str, &'a str) -> R

The problem here is that R is a type completely independent from the lifetime 'a. It requires the closure returns the same type R for all possible 'a. But the closure in question wants to return a &'a str, which depends on 'a.

@ritchie46
Copy link
Member

I see. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants