Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Set numeric_only=True in aggregations #34521

Closed
WillAyd opened this issue Jun 1, 2020 · 5 comments
Closed

ENH: Set numeric_only=True in aggregations #34521

WillAyd opened this issue Jun 1, 2020 · 5 comments
Labels
Deprecate Functionality to remove in pandas Duplicate Report Duplicate issue or pull request

Comments

@WillAyd
Copy link
Member

WillAyd commented Jun 1, 2020

I've noticed that I make this mistake quite often:

>>> df = pd.DataFrame({"a": ["1"] * 3, "b": np.ones(3)})
>>> df.sum()
a    111
b      3
dtype: object

Getting 111 as a result in column a is harmless in this example, but actually quite annoying in most real life use cases where it can produce exceedingly large strings that exhaust memory or tie up the interpreter.

The numeric_only argument gets around this issue:

>>> df.sum(numeric_only=True)
b    3.0
dtype: float64

Though I'm curious if this should really be the default

@WillAyd WillAyd added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 1, 2020
@jorisvandenbossche
Copy link
Member

There should already be some issues about this, eg I remember @jbrockmendel recently raising this as well, and I suspect some older issues as well

@jorisvandenbossche
Copy link
Member

The one I remembered is #28900, I think this can be closed as a duplicate?

@jorisvandenbossche jorisvandenbossche added Duplicate Report Duplicate issue or pull request and removed Enhancement Needs Discussion Requires discussion from core team before further action labels Jun 2, 2020
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Jun 2, 2020
@WillAyd
Copy link
Member Author

WillAyd commented Jun 2, 2020

It's a little bit different. That PR is about removing the kwarg whereas this is about changing the default value

The former would encapsulate the latter but we may want to do in steps instead of outright removal

@WillAyd WillAyd reopened this Jun 2, 2020
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 2, 2020

OK, that's right, but I would still propose to use the other issue for that discussion? As in the end it is a single discussion about what to do with this keyword: keep as is, change default, or fully remove. It are not really separate discussions?
So I would put your proposal to change the default (instead of removing the kwarg) in #28900

@WillAyd
Copy link
Member Author

WillAyd commented Jun 2, 2020

Sure that works just as well

@WillAyd WillAyd closed this as completed Jun 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants