Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support merging DataFrames on a combination of columns and index levels #14355

Closed
2 of 3 tasks
jonmmease opened this issue Oct 5, 2016 · 6 comments · Fixed by #17484
Closed
2 of 3 tasks

ENH: Support merging DataFrames on a combination of columns and index levels #14355

jonmmease opened this issue Oct 5, 2016 · 6 comments · Fixed by #17484
Labels
API Design Compat pandas objects compatability with Numpy or Python functions Enhancement Master Tracker High level tracker for similar issues
Milestone

Comments

@jonmmease
Copy link
Contributor

jonmmease commented Oct 5, 2016

Overview

@jorisvandenbossche
As a part of the Pandas 1.0 goal to "Make the index/column distinction less painful (#5677, #8162)" I propose that the df.merge method support merging DataFrames on a combination of columns and index levels.

This could be accomplished in the API by allowing the on, left_on, and right_on keywords to accept a combination of column names and index level names. Any index levels that are joined on would be preserved as index levels in the resulting merged DataFrame, while all other index levels would be removed.

This proposal is in the spirit of #5677 for df.groupby and #14353 for df.sort_values.

@jreback jreback added Enhancement API Design Compat pandas objects compatability with Numpy or Python functions Master Tracker High level tracker for similar issues labels Oct 6, 2016
@jorisvandenbossche
Copy link
Member

+1 on this proposal. As I said in the related issue (#14353 (comment)), would be nice to have a general solution for this, but for now enabling this behaviour for specific functions/keywords is fine for me.

@jreback @wesm @TomAugspurger @shoyer @sinhrks @chris-b1 any concerns of feedback regarding this proposal, before work is done to implement it? (@jmmease you plan to tackle this if OK?)

@jorisvandenbossche jorisvandenbossche added this to the Next Major Release milestone Oct 10, 2016
@jonmmease
Copy link
Contributor Author

jonmmease commented Oct 10, 2016

@jorisvandenbossche Yes, if the direction is agreeable I plan to begin tackling this set of issues during the next month or two.
Thanks for the feedback!

@TomAugspurger
Copy link
Contributor

I think I'm in favor of all the changes. Thanks for taking them on @jmmease!

@shoyer
Copy link
Member

shoyer commented Oct 10, 2016

Yes, seems like a good idea to me.

@shoyer
Copy link
Member

shoyer commented Oct 10, 2016

We do need to clarify how we will handle conflicting index/column names in a uniform way. For backwards compatibility, I think we need to always check column names before falling back to use index names.

@jonmmease
Copy link
Contributor Author

@shoyer Agreed regarding conflict resolution. Thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Compat pandas objects compatability with Numpy or Python functions Enhancement Master Tracker High level tracker for similar issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants