Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

with_columns_seq should assign by expression #14935

Closed
uditrana opened this issue Mar 8, 2024 · 6 comments
Closed

with_columns_seq should assign by expression #14935

uditrana opened this issue Mar 8, 2024 · 6 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@uditrana
Copy link

uditrana commented Mar 8, 2024

Description

Polars values itself in having a clean, readable interface. To that end, they should expose a with_columns style option/function that allows you to do sequential operations with assignment between operations. This has come up before #11570 .

Current:

df.with_columns(
    pl.col("a").add(1).alias("a_plus_1"),
).with_columns(
    pl.col("a_plus_1").add("a").alias("2a_plus_1"),
)

Ideally:

df.with_columns_seq_assign(
    pl.col("a").add(1).alias("a_plus_1"),
    pl.col("a_plus_1").add("a").alias("2a_plus_1"),
)
@uditrana uditrana added the enhancement New feature or an improvement of an existing feature label Mar 8, 2024
@McNickSisto
Copy link

I second this.

@cmdlineluser
Copy link
Contributor

The actual previous discussion for the desired functionality is:

(The _seq in this function is about parallelism.)

@uditrana
Copy link
Author

uditrana commented Apr 9, 2024

Yea, seems like @ritchie46 was pretty firm with his process in that Issue. It is a bit odd to me because the Polars API seems to have introduced multiple variants (e.g. with_columns_seq) of methods... so maybe philosophy has changed on this matter.

@avimallu
Copy link
Contributor

avimallu commented Apr 13, 2024

For reference, @uditrana, I don't think the philosophy has changed significantly. with_columns_seq was created (#10322) specifically to

serve[s] merely as an optimization on small data, when parallelism overhead is too costly.

@ritchie46
Copy link
Member

This has been discussed and we will not change the semantics of a with_columns. They will get their state from their context, not from incremental updates. The seq refers to sequential and is recommended to not be used. It is a micro-optimization that sometimes is worth it on tiny data.

@ritchie46 ritchie46 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2024
@uditrana
Copy link
Author

Yea, I wasnt really suggesting changing with_columns core semantics, but rather an alternative option like the seq version for specific use cases. But not worth pushing further if you all don't see it as a useful stylistic addition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

5 participants