-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: pivot_table not returning correct type when margin=True and aggfunc='mean' #28248
Conversation
@mabelvj Thanks for the PR. to fix the ci failure you'll need to run also add test to confirm the fix works. |
I see that there are tests referencing that np.means of ints are casted back into ints marked with pandas/pandas/tests/reshape/test_pivot.py Line 1604 in 03b3c8f
pandas/pandas/tests/reshape/test_pivot.py Line 1618 in 03b3c8f
|
Has the output changed? I would have expected them to fail CI if they were changed to no longer cast from float to int. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. can you add a release note in reshaping bug fix section for 1.0
@TomAugspurger the results do not change for those, the tests are skipped if failing since they are marked to do so. Still, checking them (since they are related to this issue, because they check the float casting on the margins), the expected values do not match the output result (old and new pivot_table give the same output here). |
this lgtm. @mabelvj can you merge master; ping on green. |
ping @mabelvj - if you can fix up merge conflicts can get this one in |
@jreback @WillAyd Fixed the conflicts, but Flake8 keeps giving a false positive F841 error on L1670-1671 because of unused variables, but they are used at the last line of the function. It's strange because other functions have the same structure and do not raise any error. |
I think you need to add |
dd121da
to
456efd2
Compare
@WillAyd Fixed now. It seems the line disappeared when fixing previous conflicts with master. |
Great - thanks @mabelvj for seeing this one through! |
…ndexing-1row-df * upstream/master: (185 commits) ENH: add BooleanArray extension array (pandas-dev#29555) DOC: Add link to dev calendar and meeting notes (pandas-dev#29737) ENH: Add built-in function for Styler to format the text displayed for missing values (pandas-dev#29118) DEPR: remove statsmodels/seaborn compat shims (pandas-dev#29822) DEPR: remove Index.summary (pandas-dev#29807) DEPR: passing an int to read_excel use_cols (pandas-dev#29795) STY: fstrings in io.pytables (pandas-dev#29758) BUG: Fix melt with mixed int/str columns (pandas-dev#29792) TST: add test for ffill/bfill for non unique multilevel (pandas-dev#29763) Changed description of parse_dates in read_excel(). (pandas-dev#29796) BUG: pivot_table not returning correct type when margin=True and aggfunc='mean' (pandas-dev#28248) REF: Create _lib/window directory (pandas-dev#29817) Fixed small mistake (pandas-dev#29815) minor cleanups (pandas-dev#29798) DEPR: enforce deprecations in core.internals (pandas-dev#29723) add test for unused level raises KeyError (pandas-dev#29760) Add documentation linking to sqlalchemy (pandas-dev#29373) io/parsers: ensure decimal is str on PythonParser (pandas-dev#29743) Reenabled no-unused-function (pandas-dev#29767) CLN:F-string in pandas/_libs/tslibs/*.pyx (pandas-dev#29775) ... # Conflicts: # pandas/tests/frame/indexing/test_indexing.py
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
What's new
Use of
maybe_downcast_to_dtype
in_add_margins
so it can resolve the dtype conversion, avoiding floats being converted to integers when the result of theaggfunc
is a float.For the new case, if after applying the aggfunc, if the margin result is not an integer, the whole column is converted to float:
Example:
Currently pandas does this:
with the fix the result is:
Issues
There are test referencing that np.means of ints are casted back into ints. However, giving that for the aggregations in the rows, floats are kept when the np.mean of integers is a float, it does not make sense that this behavior does not hold for the margins.