Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouping by index and column fails on DataFrame with single index #14327

Closed
jonmmease opened this issue Oct 1, 2016 · 0 comments · Fixed by #14428
Closed

Grouping by index and column fails on DataFrame with single index #14327

jonmmease opened this issue Oct 1, 2016 · 0 comments · Fixed by #14428
Milestone

Comments

@jonmmease
Copy link
Contributor

Referenced in #5677

Example

The following snippet shows how a MultiIndex DataFrame (df) may be grouped by a combination of a column (B) and a named index level (inner) using a Grouper object.

import pandas as pd
import numpy as np

idx = pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('b', 3)])
idx.names = ['outer', 'inner']
df = pd.DataFrame({"A": np.arange(6), 'B': ['one', 'one', 'two', 'two', 'one', 'one']}, index=idx)

In [1]: df
Out[1]: 
             A    B
outer inner        
a     1      0  one
      2      1  one
      3      2  two
b     1      3  two
      2      4  one
      3      5  one

In [2]: df.groupby(['B', pd.Grouper(level='inner')]).mean()
Out [2]: 
             A
B   inner     
one 1      0.0
    2      2.5
    3      5.0
two 1      3.0
    3      2.0

However, when the DataFrame (df2) has only a single index level an AttributeError is thrown

In [3]: df2 = df.reset_index('outer')

In [4]: df2
Out [4]: 
      outer  A    B
inner              
1         a  0  one
2         a  1  one
3         a  2  two
1         b  3  two
2         b  4  one
3         b  5  one

In [5]: df2.groupby(['B', pd.Grouper(level='inner')]).mean()  

...
AttributeError: 'Int64Index' object has no attribute 'labels'

Expected Output

In [2]: df2.groupby(['B', pd.Grouper(level='inner')]).mean()
Out [2]: 
             A
B   inner     
one 1      0.0
    2      2.5
    3      5.0
two 1      3.0
    3      2.0

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: 1.4.1
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment