Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: groupby numeric_only default #47025

Merged
merged 4 commits into from
May 18, 2022

Conversation

rhshadrach
Copy link
Member

There are two cases we want to emit a deprecation warning for DataFrameGroupBy:

  • numeric_only is not specified and columns get dropped. In this case emit a warning that the default of numeric_only will change to False in the future.
  • numeric_only is specified to False and columns still get dropped. In this case emit a warning that the op will raise in the future.

@rhshadrach rhshadrach added Groupby Deprecate Functionality to remove in pandas Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply labels May 14, 2022
@rhshadrach rhshadrach added this to the 1.5 milestone May 14, 2022
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of filterwarnings, i know its a bit annoying but can you either explicity test or just pass numeric_only=False?

libgroupby.group_var,
cython_dtype=np.dtype(np.float64),
numeric_only=numeric_only,
needs_counts=True,
post_processing=lambda vals, inference: np.sqrt(vals),
ddof=ddof,
)
if (
self.axis != 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make a helper for this rather than repeating?

@@ -81,6 +81,7 @@ def get_stats(group):
assert result.index.names[0] == "C"


@pytest.mark.filterwarnings("ignore:.*value of numeric_only.*:FutureWarning")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explictily test these rather than filtering (alt pass numeric_only=False) as needed

@rhshadrach
Copy link
Member Author

Thanks @jreback; filterwarnings has been removed and the helper had been added.

@jreback jreback merged commit 7c054d6 into pandas-dev:main May 18, 2022
@jreback
Copy link
Contributor

jreback commented May 18, 2022

very nice @rhshadrach

@rhshadrach rhshadrach deleted the depr_groupby_numeric_only branch May 18, 2022 12:57
@twoertwein
Copy link
Member

I think this causes the doc build to fail with multiple of these warnings:

:1: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
(bb.groupby(['year', 'team']).sum()

Would probably need to update the documentation to avoid these FutureWarnings.

@rhshadrach
Copy link
Member Author

Thanks @twoertwein - will do a follow up.

mroeschke added a commit that referenced this pull request May 25, 2022
* TYP: NoDefault

* ix mypy issues; re-write isinstance(..., NoDefault)

* remove two more casts

* ENH: DatetimeArray fields support non-nano (#47044)

* DEPR: groupby numeric_only default (#47025)

* DOC: Clarify decay argument validation in ewm when times is provided (#47026)

* DOC: Fix some typos in pandas/. (#47022)

* remove two more casts

* avoid cast-like annotation

* left/right

* cannot use |

Co-authored-by: jbrockmendel <jbrockmendel@gmail.com>
Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com>
Co-authored-by: Matthew Roeschke <emailformattr@gmail.com>
Co-authored-by: Shuangchi He <34329208+Yulv-git@users.noreply.github.com>
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
* TYP: NoDefault

* ix mypy issues; re-write isinstance(..., NoDefault)

* remove two more casts

* ENH: DatetimeArray fields support non-nano (pandas-dev#47044)

* DEPR: groupby numeric_only default (pandas-dev#47025)

* DOC: Clarify decay argument validation in ewm when times is provided (pandas-dev#47026)

* DOC: Fix some typos in pandas/. (pandas-dev#47022)

* remove two more casts

* avoid cast-like annotation

* left/right

* cannot use |

Co-authored-by: jbrockmendel <jbrockmendel@gmail.com>
Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com>
Co-authored-by: Matthew Roeschke <emailformattr@gmail.com>
Co-authored-by: Shuangchi He <34329208+Yulv-git@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Groupby Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DEPR: DataFrameGroupBy numeric_only defaulting to True
3 participants