Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sum of grouped bool column has inconsistent type #7001

Closed
jkleint opened this issue Apr 29, 2014 · 4 comments · Fixed by #32894
Closed

Sum of grouped bool column has inconsistent type #7001

jkleint opened this issue Apr 29, 2014 · 4 comments · Fixed by #32894
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby
Milestone

Comments

@jkleint
Copy link

jkleint commented Apr 29, 2014

Summing a bool column after a groupby gives a bool result until there are two or more True values, when it becomes a float64. Seems like it should always be an (unsigned?) integer. Straight sum without a groupby always gives an int64. This is with 0.13.1.

pd.DataFrame([True]).groupby(lambda x: 0).sum()
      0
0  True

pd.DataFrame([True,True]).groupby(lambda x: 0).sum()
   0
0  2

pd.DataFrame([False]).groupby(lambda x: 0).sum()
       0
0  False

pd.DataFrame([False,False]).groupby(lambda x: 0).sum()
       0
0  False

pd.DataFrame([False,False,True]).groupby(lambda x: 0).sum()
      0
0  True

pd.DataFrame([False,False,True,True]).groupby(lambda x: 0).sum()
   0
0  2

pd.DataFrame([False,False]).sum()
0    0
dtype: int64
@jreback
Copy link
Contributor

jreback commented Apr 29, 2014

this is a dupe of #3752, but I like your examples better, so will keep this issue!

Its possible to fix, but hasn't been high on the list of priorities

@xflr6
Copy link
Contributor

xflr6 commented Feb 14, 2016

As for getting float64 instead of int64 as result, a possible workaround is to use count_nonzero from numpy instead of sum to aggregate:

>>> pd.DataFrame([True,True]).groupby(lambda x: 0).agg(pd.np.count_nonzero)[0]
0    2
Name: 0, dtype: int64

@ediphy-dwild
Copy link

for some additional context - sometimes the user may not know they are dealing with a bool type. this may occur when performing a groupby on the result of pd.get_dummies, which may return columns of type uint8, but not always. if get_dummies returns a uint16, the issue above is not triggered, and dummies_result.groupby(...).sum() returns int types. if any of the counts in dummies is small enough, the groupby result will be float.

@aflugge
Copy link

aflugge commented Nov 12, 2019

This is really very confusing as it means some code might work well as expected on some data while running into an error on other data. I would much appreciate if this could be fixed.

@jreback jreback modified the milestones: Contributions Welcome, 1.1 Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Groupby
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants