Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.ix losing row ordering when index has duplicates #3561

Closed
dalejung opened this issue May 10, 2013 · 5 comments · Fixed by #3563
Closed

DataFrame.ix losing row ordering when index has duplicates #3561

dalejung opened this issue May 10, 2013 · 5 comments · Fixed by #3563
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@dalejung
Copy link
Contributor

import pandas as pd

ind = ['A', 'A', 'B', 'C']i
df = pd.DataFrame({'test':range(len(ind))}, index=ind)

rows = ['C', 'B']
res = df.ix[rows]
assert rows == list(res.index) # fails

The problem is that the resulting DataFrame keeps the ordering of the df.index and not the rows key. You'll notice that the rows key doesn't reference a duplicate value.

@jreback
Copy link
Contributor

jreback commented May 10, 2013

thanks for the catch, this is a case that though worked, was using a set like indexer so the order was not guaranteeed - provided an opportunity to refactor a bit...PR coming soon

@jreback
Copy link
Contributor

jreback commented May 10, 2013

Unique

In [1]: df=DataFrame(randn(5,3),index=list('ABCDE'))

In [2]: df.ix[['A']]
Out[2]: 
          0         1        2
A -1.048431 -0.435366  0.33573

In [3]: df.ix[['A','G']]
Out[3]: 
          0         1        2
A -1.048431 -0.435366  0.33573
G       NaN       NaN      NaN

Duplicate

In [4]: dfnu=DataFrame(randn(5,3),index=list('AABCD'))

In [5]: dfnu.ix[['A']]
Out[5]: 
          0         1         2
A  0.039932  1.049630 -2.647776
A -0.213537  0.747972 -0.830574

In [7]: dfnu.ix[['B','A','E']]
Out[7]: 
          0         1         2
B  0.292704 -1.396854 -0.414920
A  0.039932  1.049630 -2.647776
A -0.213537  0.747972 -0.830574

@dalejung @y-p @wesm
ok...behavior fixed, but what do you think about the last case
e.g. selecting something that doesn't exist (but at least 1 value exists)
in the unique case you get equivalent of reindexing, should I fix the duplicate case to do
the same?

@ghost
Copy link

ghost commented May 10, 2013

re:

In [41]: dfnu=DataFrame(randn(4,3),index=list('ABCD'))

In [42]: dfnu.ix[['E']]
Out[42]: 
    0   1   2
E NaN NaN NaN

In [43]: dfnu=DataFrame(randn(5,3),index=list('AABCD'))

In [44]: dfnu.ix[['E']]
Out[44]: 
Empty DataFrame
Columns: [0, 1, 2]
Index: []

yeah, that is inconsistent.

@dalejung
Copy link
Contributor Author

I think for consistency sake it should be the same. To be honest, I don't have a use case for indexing a non-existent label or an iterable key that contains a duplicate. I came across the bug when a source file upstream had a duplicate row.

Thanks for the quick patch.

@jreback
Copy link
Contributor

jreback commented May 10, 2013

np...we have been fixing duplicate indicies lately (again not there is that much use for them), but they should work....will be merged soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
2 participants