Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: reduction operations failing if min_count is larger #40143

Conversation

simonjayhawkins
Copy link
Member

@simonjayhawkins simonjayhawkins added the Reduction Operations sum, mean, min, max, etc. label Mar 1, 2021
@simonjayhawkins simonjayhawkins added this to the 1.2.3 milestone Mar 1, 2021
@simonjayhawkins
Copy link
Member Author

i'll add a test and whatsnew shortly, if we want to go this route for backport purposes. This fix does not have much value in isolation as the regression #39738 (comment) is not yet addressed here.

@simonjayhawkins
Copy link
Member Author

The regression this fixes is caused by

        if np.ndim(result) == 0:
            # TODO(EA2D): special case not needed with 2D EAs
            res_values = np.array([[result]])
        else:
            res_values = result.reshape(-1, 1)

in pandas/core/internals/blocks.py

this does not account for the fact that _maybe_null_out (and therefore nansum) may return a scalar.

nansum, _maybe_null_out and check_below_min_count need some work if they are to be called with 2d arrays, but this is outside the scope of a backport PR.

I still need to track down the earlier regression, where min_count above the row/column length but below the 2d shape does not work as intended, i.e. does nothing

@jreback
Copy link
Contributor

jreback commented Mar 1, 2021

i don't think we should block on this for 1.2.3. This patch may involve some non-trivial work.

@simonjayhawkins
Copy link
Member Author

i don't think we should block on this for 1.2.3. This patch may involve some non-trivial work.

sure. a 1.2.4 milestone has been created. easily moved.

@jreback
Copy link
Contributor

jreback commented Mar 1, 2021

i don't think we should block on this for 1.2.3. This patch may involve some non-trivial work.

sure. a 1.2.4 milestone has been created. easily moved.

great just don't want to block (and i don't see anything urgent to prevent release).

@simonjayhawkins
Copy link
Member Author

and i don't see anything urgent to prevent release

I was thinking maybe wait till #40144 is fixed. also affects backport. but we will likely have another release before the np-dev issues cause mainstream issues.

@jreback
Copy link
Contributor

jreback commented Mar 1, 2021

and i don't see anything urgent to prevent release

I was thinking maybe wait till #40144 is fixed. also affects backport. but we will likely have another release before the np-dev issues cause mainstream issues.

cool

@simonjayhawkins simonjayhawkins modified the milestones: 1.2.3, 1.2.4 Mar 2, 2021
@simonjayhawkins
Copy link
Member Author

This fix does not have much value in isolation as the regression #39738 (comment) is not yet addressed here.

this seems to be a long standing behavior. This seems non-intuitive to me since I expected min_count to apply column-wise/row-wise if axis is specified.

@jreback jreback merged commit 782dc18 into pandas-dev:master Mar 5, 2021
@jreback
Copy link
Contributor

jreback commented Mar 5, 2021

thanks @simonjayhawkins

its possible we want to refactor this for 1.3, but for later, cc @jbrockmendel

@jreback
Copy link
Contributor

jreback commented Mar 5, 2021

@meeseeksdev backport 1.2.x

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Mar 5, 2021
@simonjayhawkins simonjayhawkins deleted the reduction-operations-failing-if-`min_count`-is-larger branch March 5, 2021 15:49
simonjayhawkins added a commit that referenced this pull request Mar 5, 2021
…g if `min_count` is larger) (#40237)

Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
@rhshadrach rhshadrach mentioned this pull request Apr 23, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: reduction operations failing if min_count is larger
3 participants