reset_index fails with MultiIndex in columns #2017

gerigk · 2012-10-04T19:39:27Z

import pandas as pd
import numpy as np
df = pd.DataFrame([(1,2,3), (4,5,6)], columns = ['a','b','c'])
df.groupby(['a']).agg({'b': [np.mean, np.sum]}).reset_index()

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-1-a362e367701d> in <module>()
      2 import numpy as np
      3 df = pd.DataFrame([(1,2,3), (4,5,6)], columns = ['a','b','c'])
----> 4 df.groupby(['a']).agg({'b': [np.mean, np.sum]}).reset_index()

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in reset_index(self, level, drop, inplace)
   2522             if name is None or name == 'index':
   2523                 name = 'index' if 'index' not in self else 'level_0'
-> 2524             new_obj.insert(0, name, _maybe_cast(self.index.values))
   2525 
   2526         new_obj.index = new_index

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in insert(self, loc, column, value)
   1857         """
   1858         value = self._sanitize_column(column, value)
-> 1859         self._data.insert(loc, column, value)
   1860 
   1861     def _sanitize_column(self, key, value):

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in insert(self, loc, item, value)
    899             raise Exception('cannot insert %s, already exists' % item)
    900 
--> 901         new_items = self.items.insert(loc, item)
    902         self.set_items_norename(new_items)
    903 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/core/index.pyc in insert(self, loc, item)
   2340         if not isinstance(item, tuple) or len(item) != self.nlevels:
   2341             raise Exception("%s cannot be inserted in this MultiIndex"
-> 2342                             % str(item))
   2343 
   2344         new_levels = []

Exception: a cannot be inserted in this MultiIndex

The text was updated successfully, but these errors were encountered:

changhiskhan · 2012-10-04T22:54:43Z

right now I have it do this:

In [1]: paste
import pandas as pd
import numpy as np
df = pd.DataFrame([(1,2,3), (4,5,6)], columns = ['a','b','c'])
df.groupby(['a']).agg({'b': [np.mean, np.sum]}).reset_index()

\## -- End pasted text --
Out[1]: 
   a     b     
      mean  sum
0  1     2    2
1  4     5    5

Do you guys have any opinions on the defaults here (@wesm @jseabold @lodagro)?
Specifically:

What level in the columns to insert into by default? (right now the first)
What should the other levels be filled by? (right now empty string)

wesm · 2012-10-05T00:41:37Z

I think these defaults are fine (for now, though I don't see what else might be preferable)

lodagro · 2012-10-05T07:04:25Z

Good example of where NaN in a MultiIndex would be usefull. Currently going around it by using empty string.

changhiskhan · 2012-10-05T07:14:46Z

Yeah, I actually started out using NaN as the default placeholder but there's some issues with the NaN support in MultiIndex as @wesm pointed out in another issue. The behavior when col_fill is nan is different from when it is otherwise.

In [6]: df.groupby(['a']).agg({'b': [np.mean, np.sum]}).reset_index(col_fill=np.nan).stack()
Out[6]:
b a
0 mean 2 1
sum 2 1
nan NaN 1
1 mean 5 4
sum 5 4
nan NaN 4

In [7]: rs = df.groupby(['a']).agg({'b': [np.mean, np.sum]}).reset_index(col_fill=np.nan)

In [8]: rs.stack()
Out[8]:
b a
0 mean 2 1
sum 2 1
nan NaN 1
1 mean 5 4
sum 5 4
nan NaN 4

In [9]: rs = df.groupby(['a']).agg({'b': [np.mean, np.sum]}).reset_index()
In [10]: rs.stack()
Out[10]:
b a
0 mean 2 NaN
sum 2 NaN
NaN 1
1 mean 5 NaN
sum 5 NaN
NaN 4

Version 0.9.0 * tag 'v0.9.0': (43 commits) RLS: Version 0.9.0 final Fix groupby.median documentation BUG: need extra slash on windows for file:// BUG: default pandas.io.data start date 1/1/2000 per docs. close pandas-dev#2011 clean up tests Allow DataFrame.update to accept non DataFrame object and attempt to coerce. ENH: Use given name for DataFrame column name for FRED API BLD: quiet tox warning about missing dep BUG: reset_index fails with MultiIndex in columns pandas-dev#2017 BUG: with_statement in test_console_encode() (3a11f00) broke 2.5 test suite BUG: dict comprehension in (af3e13c) broke 2.6 test suite BUG: Timestamp dayofyear returns day of month pandas-dev#2021 BUG: pandas breaks mpl plot_date DOC: update parsers header, names args doc BUG: read_csv regression, moved date parsing to before type conversions now so can parse yymmdd hhmm format now pandas-dev#1905 Fix naming of ewmvar and ewmstd in documentation DOC: whats new for pandas-dev#2000 ENH: change default header names in read_* functions from X.1, X.2, ... to X0, X1, ... close pandas-dev#2000 TST: make test suite pass cleanly on python 3 with no matplotlib BUG: datetime64 formatting issues in DataFrame.to_csv. close pandas-dev#1993 ...

changhiskhan pushed a commit that referenced this issue Oct 5, 2012

BUG: reset_index fails with MultiIndex in columns #2017

3861008

wesm closed this as completed Oct 5, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reset_index fails with MultiIndex in columns #2017

reset_index fails with MultiIndex in columns #2017

gerigk commented Oct 4, 2012

changhiskhan commented Oct 4, 2012

wesm commented Oct 5, 2012

lodagro commented Oct 5, 2012

changhiskhan commented Oct 5, 2012

reset_index fails with MultiIndex in columns #2017

reset_index fails with MultiIndex in columns #2017

Comments

gerigk commented Oct 4, 2012

changhiskhan commented Oct 4, 2012

wesm commented Oct 5, 2012

lodagro commented Oct 5, 2012

changhiskhan commented Oct 5, 2012