Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Fix: pd.DataFrame.sum with min_count changes dtype if result contains NaNs #47091

Merged
merged 11 commits into from
May 29, 2022

Conversation

weikhor
Copy link
Contributor

@weikhor weikhor commented May 22, 2022

@pep8speaks
Copy link

pep8speaks commented May 22, 2022

Hello @weikhor! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-05-29 14:33:58 UTC

@@ -1472,7 +1472,8 @@ def _maybe_null_out(
if np.iscomplexobj(result):
result = result.astype("c16")
else:
result = result.astype("f8")
if not is_float_dtype(result):
result = result.astype("f8")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this an elif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add copy=False

@@ -780,7 +780,7 @@ Indexing
- Bug in setting large integer values into :class:`Series` with ``float32`` or ``float16`` dtype incorrectly altering these values instead of coercing to ``float64`` dtype (:issue:`45844`)
- Bug in :meth:`Series.asof` and :meth:`DataFrame.asof` incorrectly casting bool-dtype results to ``float64`` dtype (:issue:`16063`)
- Bug in :meth:`NDFrame.xs`, :meth:`DataFrame.iterrows`, :meth:`DataFrame.loc` and :meth:`DataFrame.iloc` not always propagating metadata (:issue:`28283`)
-
- Bug in :meth:`_maybe_null_out` pd.DataFrame.sum with min_count changes dtype if result contains NaNs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_maybe_null_out is private. Please reference the public functions, e.g. sum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok Have updated

# GH#46947
df = DataFrame({"a": [1.0, 2.3, 4.4], "b": [2.2, 3, np.nan]}, dtype="float32")
result = df.sum(**kwargs).dtype
assert result == "float32"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test the whole DataFrame and parametrize over all possible float dtypes.

@weikhor weikhor requested a review from phofl May 29, 2022 14:44
@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions Reduction Operations sum, mean, min, max, etc. labels May 29, 2022
@jreback jreback added this to the 1.5 milestone May 29, 2022
@jreback jreback merged commit c355145 into pandas-dev:main May 29, 2022
@jreback
Copy link
Contributor

jreback commented May 29, 2022

thanks @weikhor very nice. can you add similar tests for .prod() (which i think should just work).

@weikhor weikhor deleted the bug_fix_min_count_changes_dtype branch May 30, 2022 00:04
@weikhor weikhor restored the bug_fix_min_count_changes_dtype branch May 30, 2022 00:17
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.DataFrame.sum with min_count changes dtype if result contains NaNs
4 participants