-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: DataFrame reduction with min_count #41711
Conversation
a regression on master right? |
no 1.2.4 #41074 |
ok so this seems backportable. |
we should include the release note here. |
just copied over the whatsnew note from #41701 |
ok so plan is to merge this to master, then the backport to 1.2.5 |
I could maybe test this backportable (will be tomorrow) first to be sure. Or go ahead and merge and backport and sort out issues (if any) afterwards. |
#41758 - all seems good (test failures look unrelated) |
This comment has been minimized.
This comment has been minimized.
@meeseeksdev backport 1.2.x |
This comment has been minimized.
This comment has been minimized.
Co-authored-by: jbrockmendel <jbrockmendel@gmail.com>
@@ -245,8 +245,7 @@ def _maybe_get_mask( | |||
""" | |||
if mask is None: | |||
if is_bool_dtype(values.dtype) or is_integer_dtype(values.dtype): | |||
# Boolean data cannot contain nulls, so signal via mask being None | |||
return None | |||
return np.broadcast_to(False, values.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel didn't check the rest of the PR, but was this change a crucial part of the fix?
There are bunch of benchmarks showing a slowdown (almost all Ops benchmarks involving integer and boolean data), so that might be caused by this (didn't check to be sure).
See https://pandas.pydata.org/speed/pandas/#regressions?sort=1&dir=desc, scroll a bit down until "2021-06-02 00:29"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but was this change a crucial part of the fix?
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason this seemed necessary was because _maybe_null_out
was returning a wrong shape when the mask was None -> #41920
tests copied from #41701; i think this gets at the root problem cc @simonjayhawkins