Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support the min_count argument in groupby aggregations #9009

Open
shwina opened this issue Aug 10, 2021 · 4 comments
Open

[FEA] Support the min_count argument in groupby aggregations #9009

shwina opened this issue Aug 10, 2021 · 4 comments
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@shwina
Copy link
Contributor

shwina commented Aug 10, 2021

Updated 5/13/2024: numeric_only is now supported (as of #10629). min_count is not yet supported.

In Pandas, groupby aggregations (e.g., max) accept the following arguments:

  • min_count: the minimum number of non-null values required per group in order for the result to be non-null
  • numeric_only: only aggregate numeric columns

It would be nice for cuDF to support these as well:

In [6]: df = cudf.DataFrame({'a': [1, 1, 1, 2, 2], 'b': ['a', 'b', 'c', 'd', 'e'], 'c': [1, 2, 3, 4, 5]})

In [7]: df
Out[7]:
   a  b  c
0  1  a  1
1  1  b  2
2  1  c  3
3  2  d  4
4  2  e  5

In [8]: df.groupby('a').max(numeric_only=True)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-612714d07c42> in <module>
----> 1 df.groupby('a').max(numeric_only=True)

TypeError: max() got an unexpected keyword argument 'numeric_only'

In [9]: df.to_pandas().groupby('a').max(numeric_only=True)
Out[9]:
   c
a
1  3
2  5
@shwina shwina added feature request New feature or request Needs Triage Need team to review and classify labels Aug 10, 2021
@shwina shwina added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Aug 10, 2021
@shwina
Copy link
Contributor Author

shwina commented Aug 10, 2021

@vyasr assigned you based on our chat offline :)

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

rapids-bot bot pushed a commit that referenced this issue Apr 14, 2022
Add support for numeric_only in DataFrame._reduce, this way can use df.mean(numeric_only=True), etc. Resolves #2067. Also partially addresses #9009.

Authors:
  - https://github.com/martinfalisse

Approvers:
  - Michael Wang (https://github.com/isVoid)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #10629
@vyasr
Copy link
Contributor

vyasr commented May 13, 2024

As of #10629 the numeric_only functionality is addressed. min_count remains unsupported for now.

@vyasr vyasr changed the title [FEA] Support the min_count and numeric_only arguments in groupby aggregations [FEA] Support the min_count argument in groupby aggregations May 13, 2024
@vyasr vyasr removed their assignment May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
Status: No status
Status: In Progress
Development

No branches or pull requests

3 participants