Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support by/by_left/by_right arguments in join_where #19684

Open
mcrumiller opened this issue Nov 7, 2024 · 0 comments
Open

Support by/by_left/by_right arguments in join_where #19684

mcrumiller opened this issue Nov 7, 2024 · 0 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@mcrumiller
Copy link
Contributor

mcrumiller commented Nov 7, 2024

Description

For join_asof, we can currently supply columns to do a "regular" join on, in addition to the as-of columns to join. In a join_where, we must explicitly supply the equality conditions on all of the extra fields, which I assume (maybe incorrectly) takes a less efficient path than a regular join would on those columns:

import polars as pl
from polars import col

df1 = pl.DataFrame({
    "a": [1, 1, 2, 2],
    "b": [1, 2, 3, 4],
})
df2 = pl.DataFrame({
    "a": [1, 2, 2, 3],
    "c": [1, 2, 5, 6],
})

# Current syntax: explicitly provide equality columns
df1.join_where(
    df2,
    col("b") < col("c"),
    col("a") == col("a_right"),
)
# shape: (2, 3)
# ┌─────┬─────┬─────┐
# │ a   ┆ b   ┆ c   │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞═════╪═════╪═════╡
# │ 2   ┆ 3   ┆ 5   │
# │ 2   ┆ 4   ┆ 5   │
# └─────┴─────┴─────┘

# Proposed syntax: use by
df1.join_where(
    df2,
    col("b") < col("c"),
    by="a",  # alternative (and can provide list)
)
@mcrumiller mcrumiller added the enhancement New feature or an improvement of an existing feature label Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant