Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reset_index fails with MultiIndex in columns #2017

Closed
gerigk opened this issue Oct 4, 2012 · 4 comments
Closed

reset_index fails with MultiIndex in columns #2017

gerigk opened this issue Oct 4, 2012 · 4 comments
Labels
Milestone

Comments

@gerigk
Copy link

gerigk commented Oct 4, 2012

import pandas as pd
import numpy as np
df = pd.DataFrame([(1,2,3), (4,5,6)], columns = ['a','b','c'])
df.groupby(['a']).agg({'b': [np.mean, np.sum]}).reset_index()

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-1-a362e367701d> in <module>()
      2 import numpy as np
      3 df = pd.DataFrame([(1,2,3), (4,5,6)], columns = ['a','b','c'])
----> 4 df.groupby(['a']).agg({'b': [np.mean, np.sum]}).reset_index()

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in reset_index(self, level, drop, inplace)
   2522             if name is None or name == 'index':
   2523                 name = 'index' if 'index' not in self else 'level_0'
-> 2524             new_obj.insert(0, name, _maybe_cast(self.index.values))
   2525 
   2526         new_obj.index = new_index

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/core/frame.pyc in insert(self, loc, column, value)
   1857         """
   1858         value = self._sanitize_column(column, value)
-> 1859         self._data.insert(loc, column, value)
   1860 
   1861     def _sanitize_column(self, key, value):

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/core/internals.pyc in insert(self, loc, item, value)
    899             raise Exception('cannot insert %s, already exists' % item)
    900 
--> 901         new_items = self.items.insert(loc, item)
    902         self.set_items_norename(new_items)
    903 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.0rc2-py2.7-linux-x86_64.egg/pandas/core/index.pyc in insert(self, loc, item)
   2340         if not isinstance(item, tuple) or len(item) != self.nlevels:
   2341             raise Exception("%s cannot be inserted in this MultiIndex"
-> 2342                             % str(item))
   2343 
   2344         new_levels = []

Exception: a cannot be inserted in this MultiIndex
@changhiskhan
Copy link
Contributor

right now I have it do this:

In [1]: paste
import pandas as pd
import numpy as np
df = pd.DataFrame([(1,2,3), (4,5,6)], columns = ['a','b','c'])
df.groupby(['a']).agg({'b': [np.mean, np.sum]}).reset_index()

\## -- End pasted text --
Out[1]: 
   a     b     
      mean  sum
0  1     2    2
1  4     5    5

Do you guys have any opinions on the defaults here (@wesm @jseabold @lodagro)?
Specifically:

  1. What level in the columns to insert into by default? (right now the first)
  2. What should the other levels be filled by? (right now empty string)

@wesm
Copy link
Member

wesm commented Oct 5, 2012

I think these defaults are fine (for now, though I don't see what else might be preferable)

@wesm wesm closed this as completed Oct 5, 2012
@lodagro
Copy link
Contributor

lodagro commented Oct 5, 2012

Good example of where NaN in a MultiIndex would be usefull. Currently going around it by using empty string.

@changhiskhan
Copy link
Contributor

Yeah, I actually started out using NaN as the default placeholder but there's some issues with the NaN support in MultiIndex as @wesm pointed out in another issue. The behavior when col_fill is nan is different from when it is otherwise.

In [6]: df.groupby(['a']).agg({'b': [np.mean, np.sum]}).reset_index(col_fill=np.nan).stack()
Out[6]:
b a
0 mean 2 1
sum 2 1
nan NaN 1
1 mean 5 4
sum 5 4
nan NaN 4

In [7]: rs = df.groupby(['a']).agg({'b': [np.mean, np.sum]}).reset_index(col_fill=np.nan)

In [8]: rs.stack()
Out[8]:
b a
0 mean 2 1
sum 2 1
nan NaN 1
1 mean 5 4
sum 5 4
nan NaN 4

In [9]: rs = df.groupby(['a']).agg({'b': [np.mean, np.sum]}).reset_index()
In [10]: rs.stack()
Out[10]:
b a
0 mean 2 NaN
sum 2 NaN
NaN 1
1 mean 5 NaN
sum 5 NaN
NaN 4

yarikoptic added a commit to neurodebian/pandas that referenced this issue Nov 15, 2012
Version 0.9.0

* tag 'v0.9.0': (43 commits)
  RLS: Version 0.9.0 final
  Fix groupby.median documentation
  BUG: need extra slash on windows for file://
  BUG: default pandas.io.data start date 1/1/2000 per docs. close pandas-dev#2011
  clean up tests
  Allow DataFrame.update to accept non DataFrame object and attempt to coerce.
  ENH: Use given name for DataFrame column name for FRED API
  BLD: quiet tox warning about missing dep
  BUG: reset_index fails with MultiIndex in columns pandas-dev#2017
  BUG: with_statement in test_console_encode() (3a11f00) broke 2.5 test suite
  BUG: dict comprehension in (af3e13c) broke 2.6 test suite
  BUG: Timestamp dayofyear returns day of month pandas-dev#2021
  BUG: pandas breaks mpl plot_date
  DOC: update parsers header, names args doc
  BUG: read_csv regression, moved date parsing to before type conversions now so can parse yymmdd hhmm format now pandas-dev#1905
  Fix naming of ewmvar and ewmstd in documentation
  DOC: whats new for pandas-dev#2000
  ENH: change default header names in read_* functions from X.1, X.2, ... to X0, X1, ... close pandas-dev#2000
  TST: make test suite pass cleanly on python 3 with no matplotlib
  BUG: datetime64 formatting issues in DataFrame.to_csv. close pandas-dev#1993
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants