Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rust, python): fix is_sorted for structs #10099

Merged
merged 1 commit into from
Jul 27, 2023

Conversation

magarick
Copy link
Contributor

Struct series will now compare correctly.
We'll handle the general case later, along with some possible improvements and enhancements to is_sorted itself.

@magarick magarick requested a review from ritchie46 as a code owner July 26, 2023 19:34
@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Jul 26, 2023
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, looks good. 👍

@ritchie46 ritchie46 merged commit 9729f9c into pola-rs:main Jul 27, 2023
17 checks passed
@magarick magarick deleted the fix-is-sorted-struct branch July 27, 2023 06:14
@s-banach
Copy link
Contributor

Excuse me, but can you run the following test?

import polars as pl

df = (
    pl.DataFrame({"x": [0, 1, 0, 1], "y": [0, 0, 1, 1]})
    .select(pl.struct("x", "y").alias("xy"))
    .sort("xy")
)
assert df["xy"].is_sorted()

I can't really read/write rust, but it looks like you're attempting to test the struct inequality s1 <= s2 by testing s1.field <= s2.field for each field. In the previous discussion, I left a comment pointing out that this is not how dictionary order works. I thought this was understood.

@magarick
Copy link
Contributor Author

@s-banach Ack, you're right! I was trying to be clever and avoid storing information for every row but that's not possible. Only inequality means we get to stop and only for that row.

This is a good question actually, @ritchie46 are structs intended to be sorted in dictionary order or by every entry. And what do you mean by row encoding. I tried looking at the code in sort and I couldn't figure out what it was supposed to do.

@magarick
Copy link
Contributor Author

In going through this I realized we need special handling for the case when not every field of a struct is None. It also turns out that null_count will return the total number of nulls in the struct but the validity will only be 0 if every element of a struct is null. Since null location is determined on a per field basis, the problem becomes much harder. This should still be doable without sorting or using RLE but it'll take just a bit longer. Thanks for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants