Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional validation output to joins #2377

Closed
zundertj opened this issue Jan 14, 2022 · 6 comments · Fixed by #15559
Closed

Add optional validation output to joins #2377

zundertj opened this issue Jan 14, 2022 · 6 comments · Fixed by #15559
Assignees
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature

Comments

@zundertj
Copy link
Collaborator

(originally by #2292 (comment) )

@austospumanto:
Separately, on the problem of two columns becoming one column in the join result: it would be great if polars could retain both columns in the join like pandas does (when the two columns have different names). This is useful for checking for nulls in non-inner joins to see which rows found matches, and also for situations like the one you stated. I find myself duplicating+suffixing columns before joining to get this behavior in polars.

Suggestion by me:
It may be easier if we have indicator as an optional output as in Pandas https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html?highlight=merge#pandas.DataFrame.merge

@ritchie46
Copy link
Member

ritchie46 commented Jan 15, 2022

The indicator seems like a good option to me. If we choose that to be a boolean than it will also be very memory efficient.

@zundertj
Copy link
Collaborator Author

If boolean, that would have to be two indicators, one for whether it occurs in the original left dataframe, and one for the right dataframe. I see Pandas opts for a categorical to cover left/right/both.

@ritchie46
Copy link
Member

Still would save 2/8s of RAM. Two indicators sounds good to me.

@eutwt
Copy link

eutwt commented Dec 23, 2022

I think the validate option in pandas would be good also. (originally suggested in #5883, which was closed as a dupe of this issue)

@zundertj
Copy link
Collaborator Author

Note that the validate option has recently been added in #9278.

@stinodego stinodego added enhancement New feature or an improvement of an existing feature and removed feature labels Jul 14, 2023
@mauricioabur
Copy link

Hi. I'm not an expert on suggesting changes, so I apologize if this isn't the correct method. Although the validation option has been added, it would be very helpful to include an 'indicator' option, similar to what is available in pandas. As far as I know, this feature hasn't been added yet, and the issue explicitly requesting this option ( #5983 ) has been closed. Thanks!

@c-peters c-peters added the accepted Ready for implementation label Apr 15, 2024
@c-peters c-peters added this to Backlog Apr 15, 2024
@c-peters c-peters moved this to Done in Backlog Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants