Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: fix DataFrame sum and prod with min_count and numeric_only #41701

Conversation

simonjayhawkins
Copy link
Member

it appears that the clean-up in #35899 to simplify _reduce may have been a bit premature, have reinstated a few lines of the removed code for backport purposes.

The longer term fix would probably be to support min_count in _mgr.reduce, #40143 (comment)

@simonjayhawkins simonjayhawkins added Regression Functionality that used to work in a prior pandas version Numeric Operations Arithmetic, Comparison, and Logical operations Reduction Operations sum, mean, min, max, etc. Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply labels May 28, 2021
@simonjayhawkins simonjayhawkins added this to the 1.2.5 milestone May 28, 2021
@@ -9856,36 +9856,42 @@ def _get_data() -> DataFrame:

return out

assert numeric_only is None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should still hold?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i guess it should be assert numeric_only is None or min_count != 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct we are only using the old code path when min_count is given until _mgr.reduce can handle min_count.

why do we need to add an assert?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed, just took me a minute to understand what had changed

@jreback
Copy link
Contributor

jreback commented May 31, 2021

this overlaps with #41706 ?

@jreback
Copy link
Contributor

jreback commented May 31, 2021

ok so this is an alt to #41711 ?

@simonjayhawkins
Copy link
Member Author

this overlaps with #41706 ?

#41711? yes and no. we could just do #41711 and not backport. do this, backport this and then merge #41711. or merge #41711 and I reopen this directly against 1.2.x

those options don't include do #41711, backport it and close this one.

@simonjayhawkins
Copy link
Member Author

ok so this is an alt to #41711 ?

no really, this intended as a backport? #41711 was opened after is the alt.

@jreback
Copy link
Contributor

jreback commented May 31, 2021

this overlaps with #41706 ?

#41711? yes and no. we could just do #41711 and not backport. do this, backport this and then merge #41711. or merge #41711 and I reopen this directly against 1.2.x

those options don't include do #41711, backport it and close this one.

#41711 and I reopen this directly against 1.2.x

why is this not the preferred option?

@simonjayhawkins
Copy link
Member Author

#41711 and I reopen this directly against 1.2.x

why is this not the preferred option?

because we normally merge to master and backport. If #41711 gets merged first. then our options reduce.

@simonjayhawkins
Copy link
Member Author

closing in favor of #41711 (comment)

The change here was effectively only

        else:
            if numeric_only:
                data = _get_data()
                labels = data._get_agg_axis(axis)
                values = data.values
            result = func(values)

when min_count and numeric_only is given, without touching any other library code

@simonjayhawkins simonjayhawkins deleted the min_count-regression branch May 31, 2021 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Numeric Operations Arithmetic, Comparison, and Logical operations Reduction Operations sum, mean, min, max, etc. Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Reduction operations fail
3 participants