Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): support bytecode translation to map_dict where the lookup key is an expression #10265

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Aug 3, 2023

Integrates BINARY_SUBSCR into the generic binary operator translation path so that we can support suggestions where the key is itself an expression.

This enables new suggestions for apply functions that use lookups, such as...

import polars as pl

ascii = { o:chr(o) for o in range(127) }

df = pl.DataFrame({
    "ord": [40,50,60,70,80,90,100,110],
}).with_columns(
    prev_chr = pl.col('ord').apply( lambda x: ascii[x-1] ),
    next_chr = pl.col('ord').apply( lambda x: ascii[x+1] ),
)

...which will now generate the following:

PolarsInefficientApplyWarning: 
Expr.apply is significantly slower than the native expressions API.
Only use if you absolutely CANNOT implement your logic otherwise.
In this case, you can replace your `apply` with the following:
  - pl.col("ord").apply(lambda x: ...)
  + (pl.col("ord") - 1).map_dict(ascii)

PolarsInefficientApplyWarning: 
Expr.apply is significantly slower than the native expressions API.
Only use if you absolutely CANNOT implement your logic otherwise.
In this case, you can replace your `apply` with the following:
  - pl.col("ord").apply(lambda x: ...)
  + (pl.col("ord") + 1).map_dict(ascii)

Also: renamed BytecodeParser's can_rewrite method to can_attempt_rewrite, as we do not know if bytecode containing BINARY_SUBSCR is rewritable until we check frame variables during expression translation.


(For the curious...)

df.with_columns(
    chr = pl.col("ord").map_dict(ascii),
    prev_chr = (pl.col("ord") - 1).map_dict(ascii),
    next_chr = (pl.col("ord") + 1).map_dict(ascii),
)
# shape: (8, 4)
# ┌─────┬─────┬──────────┬──────────┐
# │ ord ┆ chr ┆ prev_chr ┆ next_chr │
# │ --- ┆ --- ┆ ---      ┆ ---      │
# │ i64 ┆ str ┆ str      ┆ str      │
# ╞═════╪═════╪══════════╪══════════╡
# │ 40  ┆ (   ┆ '        ┆ )        │
# │ 50  ┆ 2   ┆ 1        ┆ 3        │
# │ 60  ┆ <   ┆ ;        ┆ =        │
# │ 70  ┆ F   ┆ E        ┆ G        │
# │ 80  ┆ P   ┆ O        ┆ Q        │
# │ 90  ┆ Z   ┆ Y        ┆ [        │
# │ 100 ┆ d   ┆ c        ┆ e        │
# │ 110 ┆ n   ┆ m        ┆ o        │
# └─────┴─────┴──────────┴──────────┘

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Aug 3, 2023
@alexander-beedie alexander-beedie force-pushed the improved-binary-subscr-translation branch from ad5c7bf to 12f71ee Compare August 3, 2023 08:13
Copy link
Collaborator

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truly wonderful

py-polars/polars/utils/udfs.py Show resolved Hide resolved
@alexander-beedie alexander-beedie merged commit 2d5511b into pola-rs:main Aug 3, 2023
18 checks passed
@alexander-beedie alexander-beedie deleted the improved-binary-subscr-translation branch August 3, 2023 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants