Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pivot_table not returning correct type when margin=True and aggfunc='mean' #28248

Merged
merged 15 commits into from
Nov 23, 2019

Conversation

mabelvj
Copy link
Contributor

@mabelvj mabelvj commented Sep 1, 2019

What's new

Use of maybe_downcast_to_dtype in _add_margins so it can resolve the dtype conversion, avoiding floats being converted to integers when the result of the aggfunc is a float.

For the new case, if after applying the aggfunc, if the margin result is not an integer, the whole column is converted to float:

Example:

_df
   A  B  C  D
0  2  1  1  X
1  4  4  3  X
2  6  5  4  Y
3  8  8  6  Y

Currently pandas does this:

print(pd.pivot_table(_df, index="D", margins=True))                                                              
     A    B  C
D             
X    3  2.5  2
Y    7  6.5  5
All  5  4.5  3

with the fix the result is:

print(pd.pivot_table(_df, index="D", margins=True))                                                              
     A    B  C
D             
X    3  2.5  2.0
Y    7  6.5  5.0
All  5  4.5  3.5

Issues

There are test referencing that np.means of ints are casted back into ints. However, giving that for the aggregations in the rows, floats are kept when the np.mean of integers is a float, it does not make sense that this behavior does not hold for the margins.

@pep8speaks
Copy link

pep8speaks commented Sep 1, 2019

Hello @mabelvj! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-11-23 20:43:22 UTC

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Sep 2, 2019

@mabelvj Thanks for the PR.

to fix the ci failure you'll need to run black pandas https://dev.pandas.io/development/contributing.html#python-pep8-black

also add test to confirm the fix works.

@mabelvj
Copy link
Contributor Author

mabelvj commented Sep 3, 2019

I see that there are tests referencing that np.means of ints are casted back into ints marked with pytest.mark.xfail that are related to this issue, should I update them?:

@pytest.mark.xfail(reason="GH#17035 (np.mean of ints is casted back to ints)")

@pytest.mark.xfail(reason="GH#17035 (np.mean of ints is casted back to ints)")

@TomAugspurger
Copy link
Contributor

should I update them?

Has the output changed? I would have expected them to fail CI if they were changed to no longer cast from float to int.

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Sep 4, 2019
@jreback jreback added this to the 1.0 milestone Sep 4, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. can you add a release note in reshaping bug fix section for 1.0

@mabelvj
Copy link
Contributor Author

mabelvj commented Sep 5, 2019

@TomAugspurger the results do not change for those, the tests are skipped if failing since they are marked to do so. Still, checking them (since they are related to this issue, because they check the float casting on the margins), the expected values do not match the output result (old and new pivot_table give the same output here).

pandas/tests/reshape/test_pivot.py Outdated Show resolved Hide resolved
@jreback
Copy link
Contributor

jreback commented Oct 6, 2019

this lgtm. @mabelvj can you merge master; ping on green.

@WillAyd
Copy link
Member

WillAyd commented Nov 7, 2019

ping @mabelvj - if you can fix up merge conflicts can get this one in

@mabelvj
Copy link
Contributor Author

mabelvj commented Nov 23, 2019

@jreback @WillAyd Fixed the conflicts, but Flake8 keeps giving a false positive F841 error on L1670-1671 because of unused variables, but they are used at the last line of the function.

It's strange because other functions have the same structure and do not raise any error.

@WillAyd
Copy link
Member

WillAyd commented Nov 23, 2019

I think you need to add tm.assert_result_equal(result, expected) to the end of the test

@mabelvj
Copy link
Contributor Author

mabelvj commented Nov 23, 2019

@WillAyd Fixed now. It seems the line disappeared when fixing previous conflicts with master.

@WillAyd WillAyd merged commit e0bd4d5 into pandas-dev:master Nov 23, 2019
@WillAyd
Copy link
Member

WillAyd commented Nov 23, 2019

Great - thanks @mabelvj for seeing this one through!

keechongtan added a commit to keechongtan/pandas that referenced this pull request Nov 25, 2019
…ndexing-1row-df

* upstream/master: (185 commits)
  ENH: add BooleanArray extension array (pandas-dev#29555)
  DOC: Add link to dev calendar and meeting notes (pandas-dev#29737)
  ENH: Add built-in function for Styler to format the text displayed for missing values (pandas-dev#29118)
  DEPR: remove statsmodels/seaborn compat shims (pandas-dev#29822)
  DEPR: remove Index.summary (pandas-dev#29807)
  DEPR: passing an int to read_excel use_cols (pandas-dev#29795)
  STY: fstrings in io.pytables (pandas-dev#29758)
  BUG: Fix melt with mixed int/str columns (pandas-dev#29792)
  TST: add test for ffill/bfill for non unique multilevel (pandas-dev#29763)
  Changed description of parse_dates in read_excel(). (pandas-dev#29796)
  BUG: pivot_table not returning correct type when margin=True and aggfunc='mean'  (pandas-dev#28248)
  REF: Create _lib/window directory (pandas-dev#29817)
  Fixed small mistake (pandas-dev#29815)
  minor cleanups (pandas-dev#29798)
  DEPR: enforce deprecations in core.internals (pandas-dev#29723)
  add test for unused level raises KeyError (pandas-dev#29760)
  Add documentation linking to sqlalchemy (pandas-dev#29373)
  io/parsers: ensure decimal is str on PythonParser (pandas-dev#29743)
  Reenabled no-unused-function (pandas-dev#29767)
  CLN:F-string in pandas/_libs/tslibs/*.pyx (pandas-dev#29775)
  ...

# Conflicts:
#	pandas/tests/frame/indexing/test_indexing.py
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pivot_table margins=True default aggfunc='mean' does integer division
6 participants