Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: DataFrame reduction with min_count #41711

Merged
merged 4 commits into from
Jun 1, 2021

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented May 28, 2021

tests copied from #41701; i think this gets at the root problem cc @simonjayhawkins

@jreback jreback added the Reduction Operations sum, mean, min, max, etc. label May 31, 2021
@jreback jreback added this to the 1.3 milestone May 31, 2021
@jreback
Copy link
Contributor

jreback commented May 31, 2021

a regression on master right?

@simonjayhawkins
Copy link
Member

a regression on master right?

no 1.2.4 #41074

@jreback
Copy link
Contributor

jreback commented May 31, 2021

ok so this seems backportable.

@simonjayhawkins
Copy link
Member

we should include the release note here.

@jbrockmendel
Copy link
Member Author

just copied over the whatsnew note from #41701

@jreback
Copy link
Contributor

jreback commented May 31, 2021

ok so plan is to merge this to master, then the backport to 1.2.5

@simonjayhawkins
Copy link
Member

ok so plan is to merge this to master, then the backport to 1.2.5

I could maybe test this backportable (will be tomorrow) first to be sure. Or go ahead and merge and backport and sort out issues (if any) afterwards.

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this pull request Jun 1, 2021
@simonjayhawkins
Copy link
Member

ok so plan is to merge this to master, then the backport to 1.2.5

I could maybe test this backportable (will be tomorrow) first to be sure. Or go ahead and merge and backport and sort out issues (if any) afterwards.

#41758 - all seems good (test failures look unrelated)

@jreback jreback merged commit f7dd14b into pandas-dev:master Jun 1, 2021
@lumberbot-app

This comment has been minimized.

@jreback
Copy link
Contributor

jreback commented Jun 1, 2021

@meeseeksdev backport 1.2.x

@lumberbot-app

This comment has been minimized.

simonjayhawkins pushed a commit to simonjayhawkins/pandas that referenced this pull request Jun 1, 2021
TLouf pushed a commit to TLouf/pandas that referenced this pull request Jun 1, 2021
@jbrockmendel jbrockmendel deleted the regr-41701 branch June 1, 2021 16:04
simonjayhawkins added a commit that referenced this pull request Jun 1, 2021
Co-authored-by: jbrockmendel <jbrockmendel@gmail.com>
@@ -245,8 +245,7 @@ def _maybe_get_mask(
"""
if mask is None:
if is_bool_dtype(values.dtype) or is_integer_dtype(values.dtype):
# Boolean data cannot contain nulls, so signal via mask being None
return None
return np.broadcast_to(False, values.shape)
Copy link
Member

@jorisvandenbossche jorisvandenbossche Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel didn't check the rest of the PR, but was this change a crucial part of the fix?

There are bunch of benchmarks showing a slowdown (almost all Ops benchmarks involving integer and boolean data), so that might be caused by this (didn't check to be sure).

See https://pandas.pydata.org/speed/pandas/#regressions?sort=1&dir=desc, scroll a bit down until "2021-06-02 00:29"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but was this change a crucial part of the fix?

yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this seemed necessary was because _maybe_null_out was returning a wrong shape when the mask was None -> #41920

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Reduction operations fail
4 participants