Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Preserve left and right join keys in outer joins #12963

Merged
merged 12 commits into from
Dec 11, 2023
Merged

Conversation

ritchie46
Copy link
Member

@ritchie46 ritchie46 commented Dec 8, 2023

Previously, the result of an outer join did not contain the join keys of the left and right frames.
Rather, it contained a coalesced version of the left key and right key.
This loses information and does not conform to default SQL behavior.

The behavior has been changed to include the original join keys.
Name clashes are solved by appending a suffix (_right by default) to the right join key name.
The previous behavior can be retained by setting how="outer_coalesce".

Example

Before:

>>> df1 = pl.DataFrame({"L1": ["a", "b", "c"], "L2": [1, 2, 3]})
>>> df2 = pl.DataFrame({"L1": ["a", "c", "d"], "R2": [7, 8, 9]})
>>> df1.join(df2, on="L1", how="outer")
shape: (4, 3)
┌─────┬──────┬──────┐
│ L1  ┆ L2   ┆ R2   │
│ --- ┆ ---  ┆ ---  │
│ str ┆ i64  ┆ i64  │
╞═════╪══════╪══════╡
│ a   ┆ 1    ┆ 7    │
│ c   ┆ 3    ┆ 8    │
│ d   ┆ null ┆ 9    │
│ b   ┆ 2    ┆ null │
└─────┴──────┴──────┘

After:

>>> df1.join(df2, on="L1", how="outer")
shape: (4, 4)
┌──────┬──────┬──────────┬──────┐
│ L1   ┆ L2   ┆ L1_right ┆ R2   │
│ ---  ┆ ---  ┆ ---      ┆ ---  │
│ str  ┆ i64  ┆ str      ┆ i64  │
╞══════╪══════╪══════════╪══════╡
│ a    ┆ 1    ┆ a        ┆ 7    │
│ b    ┆ 2    ┆ null     ┆ null │
│ c    ┆ 3    ┆ c        ┆ 8    │
│ null ┆ null ┆ d        ┆ 9    │
└──────┴──────┴──────────┴──────┘
>>> df1.join(df2, on="a", how="outer_coalesce")  # Keeps previous behavior
shape: (4, 3)
┌─────┬──────┬──────┐
│ L1  ┆ L2   ┆ R2   │
│ --- ┆ ---  ┆ ---  │
│ str ┆ i64  ┆ i64  │
╞═════╪══════╪══════╡
│ a   ┆ 1    ┆ 7    │
│ c   ┆ 3    ┆ 8    │
│ d   ┆ null ┆ 9    │
│ b   ┆ 2    ┆ null │
└─────┴──────┴──────┘

@github-actions github-actions bot added breaking Change that breaks backwards compatibility enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Dec 8, 2023
@ritchie46 ritchie46 changed the title feat!: count valid values in col().count() feat!: change outer join schema Dec 8, 2023
@stinodego stinodego added this to the 0.20.0 milestone Dec 9, 2023
@stinodego stinodego linked an issue Dec 9, 2023 that may be closed by this pull request
2 tasks
@ritchie46 ritchie46 requested a review from c-peters as a code owner December 11, 2023 09:58
@ritchie46 ritchie46 merged commit 0c0786d into main Dec 11, 2023
29 checks passed
@ritchie46 ritchie46 deleted the outer_join_schema branch December 11, 2023 12:39
@stinodego stinodego changed the title feat!: change outer join schema feat!: Preserve left and right join keys in outer joins Dec 15, 2023
@yusufshalaby
Copy link

yusufshalaby commented Dec 18, 2023

This makes a lot of sense. Btw this breaking change was not listed in the release info and took me a bit of time to track.

@MarcoGorelli
Copy link
Collaborator

@yusufshalaby it's listed here:

image

and also is the first item appearing in the upgrade guide: https://pola-rs.github.io/polars/releases/upgrade/0.20/

Perhaps you're looking at the wrong release notes?

@yusufshalaby
Copy link

yusufshalaby commented Dec 18, 2023

Sorry! Didn't read it carefully enough. Thanks for clarifying.

@eJamie98
Copy link

echoing the comments from #9335 , should this also be implemented for other join types? I've been tripped up a few times by left joins deleting my rh join column

@ritchie46
Copy link
Member Author

Will add add an option for left joins later

nezinomas pushed a commit to nezinomas/keeping that referenced this pull request Dec 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Change that breaks backwards compatibility enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Outer join fills in missing values in join columns.
6 participants