Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby/resample of boolean series can result in bool or float dtype depending on values #24469

Closed
kevinsa5 opened this issue Dec 28, 2018 · 1 comment
Labels
Duplicate Report Duplicate issue or pull request

Comments

@kevinsa5
Copy link

>>> import pandas as pd
>>> pd.__version__
u'0.23.4'
>>> 
>>> df = pd.DataFrame({
...     'x': [1,1,2,2,3,3],
...     'y': [False, False, False, True, True, True]
... })
>>> 
>>> print df.loc[df['x'] < 3,:].groupby('x')['y'].sum()
x
1    False
2     True
Name: y, dtype: bool
>>> print df.loc[:,:].groupby('x')['y'].sum()
x
1    0.0
2    1.0
3    2.0
Name: y, dtype: float64

When performing groupby or resample summations on a boolean series, the output dtype will be bool if all of the sums are 0 or 1, and float if any of the sums is greater than 1. I expected the dtype of the output series to be dependent only on the dtype of the input series, not the values of the input series.

@chris-b1
Copy link
Contributor

Duplicate of #7001 - PR to fix would be welcome. I'm guessing we try to cast back to the original dtype after the sum - so if there are only 1s and 0s, it casts, otherwise doesn't - agree it should always be numeric.

@chris-b1 chris-b1 added the Duplicate Report Duplicate issue or pull request label Dec 28, 2018
@chris-b1 chris-b1 added this to the No action milestone Dec 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants