Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Consistently support numeric_only in groupby ops #56946

Open
17 tasks
rhshadrach opened this issue Jan 18, 2024 · 3 comments
Open
17 tasks

API: Consistently support numeric_only in groupby ops #56946

rhshadrach opened this issue Jan 18, 2024 · 3 comments
Labels
API - Consistency Internal Consistency of API/Behavior Groupby

Comments

@rhshadrach
Copy link
Member

rhshadrach commented Jan 18, 2024

Inspecting the various groupby ops, I think the following are cases where we should have numeric_only.

The following methods make sense to have numeric_only, but won't fail on any input (or hashable input for nunique and value_counts), and so I think it's okay if they don't. But is still nice to have.

  • all
  • any
  • bfill
  • count
  • ffill
  • nunique
  • value_counts

The following methods should not get a numeric_only argument. They fall into a few typical camps: filters, plotting, or they do not depend on the columns (e.g. cumcount and size)

  • boxplot
  • cumcount
  • describe # Handled by include="all"
  • filter
  • head
  • hist
  • nth
  • pipe
  • plot
  • shift
  • size
  • tail
  • take
@rhshadrach rhshadrach added Groupby API - Consistency Internal Consistency of API/Behavior labels Jan 18, 2024
@kwhkim
Copy link
Contributor

kwhkim commented Mar 16, 2024

Looking at the all possible methods that argument numeric_only= needs to support, I think...

wouldn't it be better to have independent method .numeric_only() to select columns with numeric values or number dtypes, since numeric_only= argument can conflict with the function to be .agg() in the parameter name which is numeric_only

@rhshadrach
Copy link
Member Author

One can already do DataFrame.select_dtypes('numeric'). I think we should strive for consistency of arguments between DataFrame and groupby ops where it makes sense, and this is one of those cases. So unless we're going to deprecate numeric_only across the board, I'm still positive on including them in groupby.

@kwhkim
Copy link
Contributor

kwhkim commented Mar 21, 2024

But the purpose was to select numeric columns from DataFrameGroupby. I think .select_dtypes('numer') is applicable only on DataFrame object. Or we might add the method select_dtypes() also to DataFrameGroupyby.

I think for the purpose of dendency problem I would suggest to maintain it(numeric_only) and also get *arg, and **kwargs for the functions to be applied... (I mean in the case of .mean() or .std()... etc, because .mean(skipna=False) does not work...) When you think about it, numeric_only is not so popular parameter name and we can urge users to avoid it...(I wonder if there is any function that uses parameter name numeric_only or we can always make custom function with lambda x: f(x, numeric_only=True) or something similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Groupby
Projects
None yet
Development

No branches or pull requests

2 participants