Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: aggregate(sum) returns wrong result for certain boolean input #7666

Closed
veor opened this issue Jul 4, 2014 · 1 comment
Closed

BUG: aggregate(sum) returns wrong result for certain boolean input #7666

veor opened this issue Jul 4, 2014 · 1 comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request Groupby
Milestone

Comments

@veor
Copy link

veor commented Jul 4, 2014

I have a DataFrame that looks like the following format:

df = pd.DataFrame({'foo': [1, 2, 2], 'bar': [True, False, False]})

I want group this by foo and count the number of True values in the bar column. Counting the True values can be achieved with the sum command.

In [7]: bar = [True, False, True, False, False]

In [8]: sum(bar)
Out[8]: 2

In [9]: sum(df['bar'])
Out[9]: 1

To group and count this:

In [16]: df.groupby('foo').aggregate(sum)
Out[16]:
       bar
foo
1     True
2    False

This output is erroneous. Expected output is:

       bar
foo
1      1
2      0

It works in the following case (changed so that not all cases for foo:2 are false).

In [18]: df = pd.DataFrame({'foo': [1, 2, 2, 2, 2], 'bar': [True, True, True, False, False]})
In [18]: df.groupby('foo').aggregate(sum)
Out[18]:
     bar
foo
1      1
2      2

Here are my installed versions:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.7.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.14.0
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.1.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 1.5
pytz: 2014.3
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.5
lxml: 3.3.5
bs4: 4.3.1
html5lib: None
bq: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None
@veor veor changed the title aggregate(sum) returns wrong result for certain boolean input BUG: aggregate(sum) returns wrong result for certain boolean input Jul 4, 2014
@jreback jreback added this to the 0.15.0 milestone Jul 4, 2014
@jreback
Copy link
Contributor

jreback commented Jul 4, 2014

thanks, this is a dupe of #7001

@jreback jreback closed this as completed Jul 4, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request Groupby
Projects
None yet
Development

No branches or pull requests

2 participants