DataFrame.ix losing row ordering when index has duplicates #3561

dalejung · 2013-05-10T07:18:40Z

import pandas as pd

ind = ['A', 'A', 'B', 'C']i
df = pd.DataFrame({'test':range(len(ind))}, index=ind)

rows = ['C', 'B']
res = df.ix[rows]
assert rows == list(res.index) # fails

The problem is that the resulting DataFrame keeps the ordering of the df.index and not the rows key. You'll notice that the rows key doesn't reference a duplicate value.

The text was updated successfully, but these errors were encountered:

jreback · 2013-05-10T14:09:23Z

thanks for the catch, this is a case that though worked, was using a set like indexer so the order was not guaranteeed - provided an opportunity to refactor a bit...PR coming soon

jreback · 2013-05-10T14:40:28Z

Unique

In [1]: df=DataFrame(randn(5,3),index=list('ABCDE'))

In [2]: df.ix[['A']]
Out[2]: 
          0         1        2
A -1.048431 -0.435366  0.33573

In [3]: df.ix[['A','G']]
Out[3]: 
          0         1        2
A -1.048431 -0.435366  0.33573
G       NaN       NaN      NaN

Duplicate

In [4]: dfnu=DataFrame(randn(5,3),index=list('AABCD'))

In [5]: dfnu.ix[['A']]
Out[5]: 
          0         1         2
A  0.039932  1.049630 -2.647776
A -0.213537  0.747972 -0.830574

In [7]: dfnu.ix[['B','A','E']]
Out[7]: 
          0         1         2
B  0.292704 -1.396854 -0.414920
A  0.039932  1.049630 -2.647776
A -0.213537  0.747972 -0.830574

@dalejung @y-p @wesm
ok...behavior fixed, but what do you think about the last case
e.g. selecting something that doesn't exist (but at least 1 value exists)
in the unique case you get equivalent of reindexing, should I fix the duplicate case to do
the same?

ghost · 2013-05-10T14:53:59Z

re:

In [41]: dfnu=DataFrame(randn(4,3),index=list('ABCD'))

In [42]: dfnu.ix[['E']]
Out[42]: 
    0   1   2
E NaN NaN NaN

In [43]: dfnu=DataFrame(randn(5,3),index=list('AABCD'))

In [44]: dfnu.ix[['E']]
Out[44]: 
Empty DataFrame
Columns: [0, 1, 2]
Index: []

yeah, that is inconsistent.

dalejung · 2013-05-10T20:16:55Z

I think for consistency sake it should be the same. To be honest, I don't have a use case for indexing a non-existent label or an iterable key that contains a duplicate. I came across the bug when a source file upstream had a duplicate row.

Thanks for the quick patch.

jreback · 2013-05-10T20:31:12Z

np...we have been fixing duplicate indicies lately (again not there is that much use for them), but they should work....will be merged soon

jreback mentioned this issue May 10, 2013

BUG: (GH3561) non-unique indexers with a list-like now return in the same order as the passed values #3563

Merged

jreback closed this as completed in #3563 May 14, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.ix losing row ordering when index has duplicates #3561

DataFrame.ix losing row ordering when index has duplicates #3561

dalejung commented May 10, 2013

jreback commented May 10, 2013

jreback commented May 10, 2013

ghost commented May 10, 2013

dalejung commented May 10, 2013

jreback commented May 10, 2013

DataFrame.ix losing row ordering when index has duplicates #3561

DataFrame.ix losing row ordering when index has duplicates #3561

Comments

dalejung commented May 10, 2013

jreback commented May 10, 2013

jreback commented May 10, 2013

ghost commented May 10, 2013

dalejung commented May 10, 2013

jreback commented May 10, 2013