Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing with namedtuple is broken #1026

Closed
echlebek opened this issue Apr 11, 2012 · 1 comment
Closed

Indexing with namedtuple is broken #1026

echlebek opened this issue Apr 11, 2012 · 1 comment

Comments

@echlebek
Copy link

Although it is possible to index MultiIndexed DataFrames with multiple index columns, one or more of which have a compound type, it is not possible to index an Indexed DataFrame with a compound type for its column, nor is it possible to index a MultiIndexed Dataframe with a single column that has a compound type.

tl;dr - I can't index a DataFrame with a namedtuple, even though I can create one.

In the first example, I try to index a dataframe with a namedtuple with a regular Index, which fails.

In the second example, I index a dataframe with a tuple of namedtuples (MultiIndex), which succeeds.

In the third example, I try to index a dataframe with a length-1 tuple of namedtuples, again with a MultiIndex, which fails.

from collections import namedtuple
import pandas

# First example
""" 
>>> IndexType = namedtuple("IndexType", ["a", "b"])
>>> idx1 = IndexType("foo", "bar")
>>> idx2 = IndexType("baz", "bof")
>>> index = pandas.Index([idx1, idx2], name="composite_index")
>>> index
Index([IndexType(a='foo', b='bar'), IndexType(a='baz', b='bof')], dtype=object)
>>> df = pandas.DataFrame([(1, 2), (3, 4)], index=index, columns=["A", "B"])
>>> df
                             A  B
composite_index..................
IndexType(a='foo', b='bar')  1  2
IndexType(a='baz', b='bof')  3  4
>>> df.ix[IndexType("foo", "bar")]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Network/Cluster/home/echlebek/.virtualenvs/pandas/lib/python2.6/site-packages/pandas-0.7.3.dev_3d4d5af#
    return self._getitem_tuple(key)
  File "/Network/Cluster/home/echlebek/.virtualenvs/pandas/lib/python2.6/site-packages/pandas-0.7.3.dev_3d4d5af#
    return self._getitem_lowerdim(tup)
  File "/Network/Cluster/home/echlebek/.virtualenvs/pandas/lib/python2.6/site-packages/pandas-0.7.3.dev_3d4d5af#
    section = self._getitem_axis(key, axis=i)
  File "/Network/Cluster/home/echlebek/.virtualenvs/pandas/lib/python2.6/site-packages/pandas-0.7.3.dev_3d4d5af#
    return self._get_label(idx, axis=0)
  File "/Network/Cluster/home/echlebek/.virtualenvs/pandas/lib/python2.6/site-packages/pandas-0.7.3.dev_3d4d5af#
    return self.obj.xs(label, axis=axis, copy=True)
  File "/Network/Cluster/home/echlebek/.virtualenvs/pandas/lib/python2.6/site-packages/pandas-0.7.3.dev_3d4d5af#
    loc = self.index.get_loc(key)
  File "/Network/Cluster/home/echlebek/.virtualenvs/pandas/lib/python2.6/site-packages/pandas-0.7.3.dev_3d4d5af#
    return self._engine.get_loc(key)
  File "engines.pyx", line 101, in pandas._engines.DictIndexEngine.get_loc (pandas/src/engines.c:2498)
  File "engines.pyx", line 108, in pandas._engines.DictIndexEngine.get_loc (pandas/src/engines.c:2460)
KeyError: 'foo'
""" 

# Second example

""" 
>>> mult_index = pandas.MultiIndex.from_tuples([(idx1, idx2)], names=["comp_1", "comp_2"])
>>> mult_index
MultiIndex([(IndexType(a='foo', b='bar'), IndexType(a='baz', b='bof'))], dtype=object)
>>> df = pandas.DataFrame([(1, 2, 3, 4)], index=mult_index, columns=["A", "B", "C", "D"])
>>> df
                                                         A  B  C  D
comp_1                      comp_2.................................
IndexType(a='foo', b='bar') IndexType(a='baz', b='bof')  1  2  3  4
>>> df.ix[(IndexType("foo", "bar"), IndexType("baz", "bof"))]
A    1   
B    2   
C    3   
D    4   
Name: (IndexType(a='foo', b='bar'), IndexType(a='baz', b='bof'))
""" 

# Third example

""" 
>>> index = pandas.MultiIndex.from_tuples([(IndexType("foo", "bar"),), (IndexType("baz", "bof"),)], names=["ind#
>>> index
Index([IndexType(a='foo', b='bar'), IndexType(a='baz', b='bof')], dtype=object
>>> df = pandas.DataFrame([(1, 2), (3, 4)], index=index, columns=["A", "B"])
>>> df
                             A  B
index............................
IndexType(a='foo', b='bar')  1  2
IndexType(a='baz', b='bof')  3  4
>>> df.ix[IndexType("foo", "bar")]
Traceback (most recent call last):
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Network/Cluster/home/echlebek/.virtualenvs/pandas/lib/python2.6/site-packages/pandas-0.7.3.dev_3d4d5af#
    return self._getitem_tuple(key)
  File "/Network/Cluster/home/echlebek/.virtualenvs/pandas/lib/python2.6/site-packages/pandas-0.7.3.dev_3d4d5af#
    return self._getitem_lowerdim(tup)
  File "/Network/Cluster/home/echlebek/.virtualenvs/pandas/lib/python2.6/site-packages/pandas-0.7.3.dev_3d4d5af#
    section = self._getitem_axis(key, axis=i)
  File "/Network/Cluster/home/echlebek/.virtualenvs/pandas/lib/python2.6/site-packages/pandas-0.7.3.dev_3d4d5af#
    return self._get_label(idx, axis=0)
  File "/Network/Cluster/home/echlebek/.virtualenvs/pandas/lib/python2.6/site-packages/pandas-0.7.3.dev_3d4d5af#
    return self.obj.xs(label, axis=axis, copy=True)
  File "/Network/Cluster/home/echlebek/.virtualenvs/pandas/lib/python2.6/site-packages/pandas-0.7.3.dev_3d4d5af#
    loc = self.index.get_loc(key)
  File "/Network/Cluster/home/echlebek/.virtualenvs/pandas/lib/python2.6/site-packages/pandas-0.7.3.dev_3d4d5af#
    return self._engine.get_loc(key)
  File "engines.pyx", line 101, in pandas._engines.DictIndexEngine.get_loc (pandas/src/engines.c:2498)
  File "engines.pyx", line 108, in pandas._engines.DictIndexEngine.get_loc (pandas/src/engines.c:2460)
KeyError: 'foo'
>>> df.ix[(IndexType("foo", "bar"),)]
      A   B
foo NaN NaN
bar NaN NaN
"""
@wesm wesm closed this as completed in 0b5a007 Apr 14, 2012
@dtcaciuc
Copy link

The issue looks a bit more complicated now...

First of all, we realized the above test is reporting false positive because of #1069

Secondly, an additional problem lies here. In particular, _is_list_like prevents using any iterable object as an Index key.

At this point, it's a question of where you want to go with the indexing interface. I think might be reasonable to limit the types (aside from Index itself) used for supplying index sequences to, say, tuple, list and numpy.array. The upside is not having to think about adding more exceptions (currently there's basestring, plus, in our case, a tuple subclass); the downside is not supporting arbitrary iterables such as generators. I would personally be in favour of the former because it is the simplest of the two (internal logic and behaviour-wise) in the long run.

@wesm wesm reopened this May 2, 2012
@wesm wesm closed this as completed in de427aa May 7, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants