fix: Allow chaining joins in SQL syntax #14220

michiel-de-muynck · 2024-02-02T17:31:53Z

With this fix, any of the previously joined table names can be used to refer to any of the previously joined tables. For example, in the query

SELECT * FROM df1
INNER JOIN df2 ON ...
INNER JOIN df3 ON <join condition>

in the join condition marked <join condition> you can use either df1. or df2. to refer to any column on the left side of the join. For instance, you can use df1.x even if column x only exists in df2. You can also use df2.y even if column y only exists in df1.

That is more generous than should be the case of course, in the sense that this new implementation accepts some invalid queries where it should give an error. However, in the current implementation, you can already use df1. to refer to any column (of dataframe df1 or df2) but cannot use df2 at all (see issue #14217).

alexander-beedie · 2024-02-02T20:39:47Z

Nice, I'll take a look; had almost finished fixing this (or something very similar) and then got wildly distracted by calamine bindings; I'll check this first thing tomorrow against some additional tests I added for verification - if it looks good then it's all yours 😉👌

alexander-beedie · 2024-02-04T17:01:47Z

Hi @migi, thanks for the PR - I took a look and tested it out but currently it only addresses a subset of the wider issue and, in its current state, doesn't actually return the correct result. The added unit test result should include an additional two columns for this query (on top of "a", "b", "c", "d"); specifically, both "b" and "c" should be repeated - here's a pure PostgreSQL version of the unit test to demonstrate:

I've also been working on fixing the current JOIN issues, accounting for the above (and some additional edge-cases connected with column resolution), and should have something ready in the near future (we have also been discussing how to handle duplicate columns in a way that is more consistent with SQL, rather than appending "_right", as we do for non-SQL joins, and that will be part of the coming update) 🤔

michiel-de-muynck · 2024-02-04T20:05:32Z

Thank you Alexander for the PR. It indeed only addresses the problem in #14217, not all of the problems with joins, that is a much bigger refactoring than the PR here.

The problem you describe also occurs with just a simple join instead of chained joins. For example, the query

SELECT * FROM df1
INNER JOIN df2 ON df1.b = df2.b

for this query Polars currently returns 3 column ( a, b and c), not 4 (a, df1.b, df2b and c).

Feel free to close the PR if the work you were doing encompasses the changes in this PR.

alexander-beedie · 2024-05-28T10:09:05Z

Thank you Alexander for the PR. It indeed only addresses the problem in #14217, not all of the problems with joins, that is a much bigger refactoring than the PR here.

Took me a while, but got there in the end... ;)

Feel free to close the PR if the work you were doing encompasses the changes in this PR.

Should all be dealt with by #16507, which can be merged after #16541 (and #16559).

Update: all merged now - fixes will be in 0.20.31 ✌️

Fix: Allow chaining joins

ed59d91

michiel-de-muynck requested review from ritchie46, stinodego, orlp, c-peters and universalmind303 as code owners February 2, 2024 17:31

Run rustfmt

6063bbe

stinodego changed the title ~~Fix: Allow chaining joins~~ fix: Allow chaining joins in SQL syntax Feb 15, 2024

stinodego added the A-sql Area: Polars SQL functionality label Feb 15, 2024

github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Feb 15, 2024

alexander-beedie closed this May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Allow chaining joins in SQL syntax #14220

fix: Allow chaining joins in SQL syntax #14220

michiel-de-muynck commented Feb 2, 2024

alexander-beedie commented Feb 2, 2024 •

edited

Loading

alexander-beedie commented Feb 4, 2024 •

edited

Loading

michiel-de-muynck commented Feb 4, 2024

alexander-beedie commented May 28, 2024 •

edited

Loading

fix: Allow chaining joins in SQL syntax #14220

fix: Allow chaining joins in SQL syntax #14220

Conversation

michiel-de-muynck commented Feb 2, 2024

alexander-beedie commented Feb 2, 2024 • edited Loading

alexander-beedie commented Feb 4, 2024 • edited Loading

michiel-de-muynck commented Feb 4, 2024

alexander-beedie commented May 28, 2024 • edited Loading

alexander-beedie commented Feb 2, 2024 •

edited

Loading

alexander-beedie commented Feb 4, 2024 •

edited

Loading

alexander-beedie commented May 28, 2024 •

edited

Loading