Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas.merge issue with duplicated column names #11762

Closed
wavexx opened this issue Dec 4, 2015 · 9 comments
Closed

pandas.merge issue with duplicated column names #11762

wavexx opened this issue Dec 4, 2015 · 9 comments
Labels
Error Reporting Incorrect or improved errors from pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@wavexx
Copy link

wavexx commented Dec 4, 2015

df1 = pd.DataFrame([[1, 1]], columns=['x','x'])
df2 = pd.DataFrame([[1, 1]], columns=['x','y'])
pd.merge(df1, df2, on='x')

Results in:

Traceback (most recent call last):
  File "test.py", line 5, in <module>
    m = pd.merge(df1, df2, on='x')
  File "/usr/lib/python3/dist-packages/pandas/tools/merge.py", line 35, in merge
    return op.get_result()
  File "/usr/lib/python3/dist-packages/pandas/tools/merge.py", line 196, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
  File "/usr/lib/python3/dist-packages/pandas/tools/merge.py", line 324, in _get_join_info
    sort=self.sort, how=self.how)
  File "/usr/lib/python3/dist-packages/pandas/tools/merge.py", line 516, in _get_join_indexers
    llab, rlab, shape = map(list, zip( * map(fkeys, left_keys, right_keys)))
  File "/usr/lib/python3/dist-packages/pandas/tools/merge.py", line 681, in _factorize_keys
    llab = rizer.factorize(lk)
  File "pandas/hashtable.pyx", line 850, in pandas.hashtable.Int64Factorizer.factorize (pandas/hashtable.c:15601)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

See #11754

@jreback
Copy link
Contributor

jreback commented Dec 4, 2015

hmm, ok though a bit odd to do this.

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Difficulty Intermediate labels Dec 4, 2015
@jreback jreback modified the milestones: 0.18.0, Next Major Release Dec 4, 2015
@wavexx
Copy link
Author

wavexx commented Dec 4, 2015

Yes, it's clearly not intentional what you're trying to do here.

@jreback
Copy link
Contributor

jreback commented Dec 4, 2015

I would for now simply raise in this case (a helpful message). If you have a nice use-case to actually do this, then lets reconsider.

@jreback jreback added Error Reporting Incorrect or improved errors from pandas and removed Bug labels Dec 4, 2015
@wavexx
Copy link
Author

wavexx commented Dec 4, 2015

I would also raise an error here, but for simmetry, only if both DF have a different number of duplicates for the column.

Ie, this should work, even if with a warning:

df1 = pd.DataFrame([[1, 1]], columns=['x','x'])
df2 = pd.DataFrame([[1, 1]], columns=['x','x'])
pd.merge(df1, df2, on='x')

@jreback
Copy link
Contributor

jreback commented Dec 4, 2015

IIRC we recently allowed duplicates when they are NOT the merge on column.

as I said I don't think merging on a duplicate column is ever warranted and I would just raise as this is a source of error/confusion. Having a special case is prob not necessary.

@wavexx
Copy link
Author

wavexx commented Dec 4, 2015

On 04/12/15 17:55, Jeff Reback wrote:

IIRC we recently allowed duplicates when they are NOT the merge on column.

I see. If it was already agreed, I do not have a strong point for it.

@jreback
Copy link
Contributor

jreback commented Dec 4, 2015

see #10639

@mitar
Copy link
Contributor

mitar commented Apr 26, 2018

I think the issue here is that one can specify columns to merge on just be the column name, and not by the column index, If the latter would be possible, then duplicate column names would not be a problem.

@mroeschke
Copy link
Member

Looks like we have tested behavior for this now (with a clearer error message) in test_get_label_or_level_values_df_duplabels, so I think this issue is sufficiently addressed. Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

5 participants