Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/unexpected behaviour when using groupby and aggregation functions with DataFrame #1697

Closed
ichipper opened this issue Jul 27, 2012 · 0 comments
Milestone

Comments

@ichipper
Copy link

Here is the bug to reproduce the bug/unexpected behavior:

from pandas import DataFrame
from pandas import MultiIndex

midx = MultiIndex.from_tuples([('f1', 's1'),('f1','s2'),('f2', 's1'),('f2', 's2'),('f3', 's1'),('f3','s2')])
df = DataFrame([[1,2,3,4,5,6],[7,8,9,10,11,12]], columns= midx)
df1 = df.select(lambda u: u[0] in ['f2', 'f3'], axis=1)
df1_group = df1.groupby(axis=1, level=0)
print df1_group.groups
print df1_group.sum()

When running the code, we can see that df1 is:

   f1          f2         f3    
   s1  s2  s1  s2  s1  s2
0   1   2   3    4    5     6
1   7   8   9   10  11   12

And df1 is selected from subblocks of df:

   f2        f3    
   s1  s2  s1  s2
0   3   4   5   6
1   9  10  11  12

After grouping df1 by the first level of multiindex of the columns,
we can see df1_group.groups is:

{'f2': [('f2', 's1'), ('f2', 's2')], 'f3': [('f3', 's1'), ('f3', 's2')]}

However, when apply a sum function to aggregate the columns inside each group, as in the example code,
df1_group.sum() results in:

   f1       f2  f3
0 NaN   7  11
1 NaN  19  23

It seems it tries to do the aggregation using the columns of df instead of df1 so the columns of the resulting dataframe
include the label 'f1', which doesn't exist in df1.

@wesm wesm closed this as completed in 48a3194 Aug 12, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants