Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): address unexpected expression name from use of unary - or + operators #11158

Merged

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Sep 17, 2023

Closes #11149.

Because we implement -expr as lit(0) - expr (and +expr as lit(0) + expr) the output name of the resulting expressions is literal. While this makes sense (leftmost expr naming convention), it is not something that the caller expects as this is an internal detail that they know nothing about (and shouldn't have to read the source to discover ;)

In accordance with "the principle of least surprise", this PR maintains the expected name, as determined from the rhs expr (via meta.output_name), and not the anonymous/internal lhs lit. As well as resulting in the expected name for a single op, this also ensures that you can easily negate multiple columns without having to explicitly alias them all (to avoid a duplicate colnames error).

Notes

I can see that #5669 wanted to codify the naming behaviour as "literal", but I think that's not the way to go as this is an internal implementation detail that the user doesn't see (and therefore can't predict/understand) and that could change (if we wanted to use a dedicated .neg() expression method on the Rust side, change it to expr * -1, etc; the name should stay stable irrespective of the underlying implementation).

Example

Negating a single column:

df = pl.DataFrame({"x": [3, 2, 1]})
df.with_columns(-pl.col("x"))

Before

┌─────┬─────────┐
│ x   ┆ literal │
│ --- ┆ ---     │
│ i64 ┆ i64     │
╞═════╪═════════╡
│ 3   ┆ -3      │
│ 2   ┆ -2      │
│ 1   ┆ -1      │
└─────┴─────────┘

After

┌─────┐
│ x   │
│ --- │
│ i64 │
╞═════╡
│ -3  │
│ -2  │
│ -1  │
└─────┘

Negating multiple columns:

pl.DataFrame(
    {
        "x": [3, 2, 1],
        "y": [6, 7, 8],
    }
).with_columns(-pl.col("x"), -pl.col("y"))

Before

ComputeError: The name: 'literal' passed to `LazyFrame.with_columns` is duplicate

After

┌─────┬─────┐
│ x   ┆ y   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ -3  ┆ -6  │
│ -2  ┆ -7  │
│ -1  ┆ -8  │
└─────┴─────┘

@alexander-beedie alexander-beedie changed the title fix(python): address unexpected expression name on use of __neg__ or __pos__ operators fix(python): address unexpected expression name from use of unary - or + operators Sep 17, 2023
@github-actions github-actions bot added fix Bug fix python Related to Python Polars labels Sep 17, 2023
@gab23r
Copy link
Contributor

gab23r commented Sep 17, 2023

Isn't it simpler to code it as expr * lit(-1) ?

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Sep 17, 2023

Isn't it simpler to code it as expr * lit(-1) ?

That has almost the same behaviour, but not quite; for instance, you can multiply a UInt8 expr by -1, resulting in an Int32, but you'll get an overflow error if you try to negate it; this PR maintains the current behaviour, whereas multiplication would change it by (potentially) modifying the column dtype.

@gab23r
Copy link
Contributor

gab23r commented Sep 17, 2023

Ok, thanks for the explanation !

@ritchie46 ritchie46 merged commit bb7776f into pola-rs:main Sep 19, 2023
23 checks passed
@alexander-beedie alexander-beedie deleted the unary-ops-consistent-expr-naming branch September 19, 2023 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Name of column with unary operator '-' becomes 'literal'
3 participants