Updating _resolve_numeric_only function of GroupBy #43154

kurchi1205 · 2021-08-21T14:20:06Z

closes BUG: Regression from 1.2.5 to 1.3.x: groupby using sum on DataFrame containing lists fails #43108 and BUG: DataFrame.groupby drops timedelta column in v1.3.0 #42395
It sets the numeric_only value based on the columns on which aggregate functions are applied
tests added / passed
New Test is added to test_groupby_aggregation_mixed_dtype function
Ensure all linting tests pass, see here for how to run them
whatsnew entry
Checking the column DataTypes in _resolve_numeric_only function

I am absolutely new to contributing on Open Source . So please guide me through it .

pep8speaks · 2021-08-21T14:20:09Z

Hello @kurchi1205! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-09-09 16:40:13 UTC

Updated PEP8 issues in groupby.py

Solved PEP8 issues in test_aggregate

jreback · 2021-08-21T14:44:04Z

pandas/tests/groupby/aggregate/test_aggregate.py

@@ -113,7 +113,19 @@ def test_groupby_aggregation_mixed_dtype():
    g = df.groupby(["by1", "by2"])
    result = g[["v1", "v2"]].mean()
    tm.assert_frame_equal(result, expected)
+    expected2 = DataFrame(


make a new test

I will work on that tomorrow

I have developed a new test case .

jreback · 2021-08-21T14:44:21Z

pandas/core/groupby/groupby.py

@@ -1119,7 +1120,24 @@ def _resolve_numeric_only(self, numeric_only: bool | lib.NoDefault) -> bool:
            # i.e. not explicitly passed by user
            if self.obj.ndim == 2:
                # i.e. DataFrameGroupBy
-                numeric_only = True
+                # Checking if the dataframe has non-numeric features


this is very complicated, see if you can do this in a simpler way

ok , working on it

I have updated a simpler version

ill need to look at the linked issue more closely, but im very skeptical that this method is the right place for a fix.

Explicitly numeric_only =False solves the error , so I looked into _resolve_numeric_only function.
It only checked for dataframe groupby object and set numeric_only to True , which forced the non_numeric columns out of the aggregation . That's why I included checks for multi datatype dataframe .

The point of numeric_only is to force non-numeric columns out of the aggregation. Changing the behavior here "fixed" the case you tested, but also changed a bunch of other behavior that broke a bunch of other tests

OK I get ur part, I will check for numeric columns, if there are none only then it should set numeric_only to false, that should solve the issue and not violate the documentation

I'm not sure you do.

numeric_only is an argument passed by the user, or defaulting to True if not explicitly passed. We should never be changing it based on what dtypes are actually present.

But why can't the default value be set checking the datatypes , why does it have to be True always .

Simplifying the resolve function

Solving PEP8 Issues

jbrockmendel · 2021-08-21T16:50:55Z

pandas/core/groupby/groupby.py

@@ -10,6 +10,7 @@ class providing the base-class of operations.

 from contextlib import contextmanager
 import datetime
+import numpy as np


this is already imported below

Solving issue when self.keys is of NoneType

jbrockmendel · 2021-08-21T20:28:54Z

Commented on #43108, I think this is correct as-is.

Checks for the presence of numeric features in columns to be aggregated

Adding a new test case of only non_numeric features

Checking for empty dataframes passed in groupby

jbrockmendel · 2021-08-22T21:38:23Z

Changing the default is something we can do, but that is an API change, not a bugfix, and would require a deprecation cycle.

kurchi1205 · 2021-08-23T04:28:28Z

ok , so how to go about with that ? I don't know how that works ?

jbrockmendel · 2021-08-23T21:47:06Z

ok , so how to go about with that ? I don't know how that works ?

First there needs to be consensus that we want to change the behavior. That discussion seems to be in #43108. Then a PR that doesn't change the behavior, but does issue a warning that the behavior will change in a future version, e.g. #42738

kurchi1205 · 2021-08-24T06:53:00Z

ok So I will create a different pull request and issue a warning in groupby.py

simonjayhawkins · 2021-08-24T12:32:42Z

@kurchi1205 I've closed #43108 as a duplicate of #42395.

but okay to have tests for both issue reports. can you also add the code sample from #42395

simonjayhawkins

@kurchi1205 can you add a release note.

simonjayhawkins · 2021-09-01T12:06:30Z

pandas/core/groupby/groupby.py

@@ -1102,14 +1102,11 @@ def _wrap_applied_output(self, data, keys, values, not_indexed_same: bool = Fals
    def _resolve_numeric_only(self, numeric_only: bool | lib.NoDefault) -> bool:
        """
        Determine subclass-specific default value for 'numeric_only'.
-


can you restore this whitespace. docstrings should start with a one-liner summary distinct from the expanded description.

I"ve done this in 45f54d6

simonjayhawkins · 2021-09-01T12:07:06Z

pandas/core/groupby/groupby.py

        For SeriesGroupBy we want the default to be False (to match Series behavior).
        For DataFrameGroupBy we want it to be True (for backwards-compat).
-


restore this one too.

I"ve done this in 45f54d6

simonjayhawkins · 2021-09-01T12:07:17Z

pandas/core/groupby/groupby.py

        Parameters
        ----------
        numeric_only : bool or lib.no_default
-


I"ve done this in 45f54d6

simonjayhawkins · 2021-09-01T12:10:53Z

pandas/core/groupby/groupby.py

-
-        # error: Incompatible return value type (got "Union[bool, NoDefault]",
-        # expected "bool")
-        return numeric_only  # type: ignore[return-value]


can you restore this too. The mypy error needs to be ignored as mypy does not know that lib.no_default is a singleton.

@kurchi1205 can you add a release note.

I assume the release note should be added in v1.4.0.rst

can you restore this too. The mypy error needs to be ignored as mypy does not know that lib.no_default is a singleton.

I"ve done this in 45f54d6

@kurchi1205 can you add a release note.

I assume the release note should be added in v1.4.0.rst

We discussed at pandas dev meeting today, and put this in v1.3.3.rst

pandas/core/groupby/groupby.py

pandas/tests/groupby/test_function.py

pandas/core/groupby/groupby.py

Dr-Irv · 2021-09-09T02:29:42Z

@kurchi1205 can you add a release note.

I"ve done this in 45f54d6

Dr-Irv · 2021-09-09T02:33:49Z

@jbrockmendel @simonjayhawkins @jreback Based on discussion at today's meeting, I updated the PR based on previous comments. Please review to see if it's good to go for 1.3.3.

jreback · 2021-09-09T02:50:26Z

lgtm. in theory this might have a small penalty, but its ok as we need the correct results.

Dr-Irv · 2021-09-09T03:27:20Z

lgtm. in theory this might have a small penalty, but its ok as we need the correct results.

There's a failure https://github.com/pandas-dev/pandas/pull/43154/checks?check_run_id=3551554045#step:7:70 that needs to be investigated before we can merge.

Dr-Irv · 2021-09-09T16:41:21Z

There's a failure https://github.com/pandas-dev/pandas/pull/43154/checks?check_run_id=3551554045#step:7:70 that needs to be investigated before we can merge.

I put something in to address this failure related to looking for the FutureWarning. See d46091e

Dr-Irv · 2021-09-09T17:51:06Z

@jreback I think this is good to go. Two CI failures. One is an issue in building the docs due to a connection failure. The other is the Windows timeout.

jreback

lgtm thanks @kurchi1205 and @Dr-Irv

… of GroupBy

#43481) Co-authored-by: Prerana Chakraborty <40196782+kurchi1205@users.noreply.github.com>

jreback · 2021-09-09T19:03:58Z

@simonjayhawkins IIRC there were some duplicates of the original issue that we can now see if they are solved? can you link

Dr-Irv · 2021-09-09T21:56:11Z

@simonjayhawkins IIRC there were some duplicates of the original issue that we can now see if they are solved? can you link

There was the original issue #42395 and a "duplicate" that I created #43108. @kurchi1205 added tests from both of those. Not aware of any others.

simonjayhawkins · 2021-09-10T16:01:53Z

@simonjayhawkins IIRC there were some duplicates of the original issue that we can now see if they are solved? can you link

There was the original issue #42395 and a "duplicate" that I created #43108. @kurchi1205 added tests from both of those. Not aware of any others.

There was also #43209 that was a regression from the same commit #41706, but it doesn't look like the change in this PR has fixed that one.

simonjayhawkins · 2021-09-10T17:15:05Z

this test was added in #41706

sorry this was changed... from #43154 (comment)

(there was also a change to test_cython_agg_nothing_to_agg which i'm not sure about. It looks like a case that used to raise, now returns an empty DataFrame, not checked to see if that is a bugfix or an api change, i'm considering that one out of scope for now)

I'll open an new issue rather than continue the discussion here.

simonjayhawkins · 2021-09-10T17:32:15Z

I'll open an new issue rather than continue the discussion here.

#43501

Updating _resolve_numeric_only function of GroupBy

6b0f5e7

kurchi1205 added 2 commits August 21, 2021 19:57

Update groupby.py

116534f

Updated PEP8 issues in groupby.py

Update test_aggregate.py

7b5ecb4

Solved PEP8 issues in test_aggregate

jreback requested changes Aug 21, 2021

View reviewed changes

kurchi1205 added 4 commits August 21, 2021 20:38

Update groupby.py

7faf1fc

Simplifying the resolve function

Update groupby.py

d773e7a

Solving PEP8 Issues

Update test_aggregate.py

5b4e799

Update groupby.py

3d2e78e

jbrockmendel reviewed Aug 21, 2021

View reviewed changes

kurchi1205 added 5 commits August 21, 2021 23:07

Update groupby.py

5836f91

Solving issue when self.keys is of NoneType

Update groupby.py

1eb0e25

Update test_aggregate.py

a0391e6

Update test_aggregate.py

2a1835a

Update test_aggregate.py

66ecb96

kurchi1205 added 9 commits August 22, 2021 10:25

Update groupby.py

c80fa9e

Checks for the presence of numeric features in columns to be aggregated

Update groupby.py

95b6533

Update test_aggregate.py

fa8a8ca

Adding a new test case of only non_numeric features

Update test_aggregate.py

e3f6767

Update groupby.py

ed668e6

Update groupby.py

f30855a

Update groupby.py

65261b4

Update groupby.py

2477aa3

Checking for empty dataframes passed in groupby

Update groupby.py

da24f29

Solving the errors

08b2429

simonjayhawkins requested changes Sep 1, 2021

View reviewed changes

jreback requested changes Sep 4, 2021

View reviewed changes

pandas/core/groupby/groupby.py Show resolved Hide resolved

pandas/tests/groupby/test_function.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Sep 8, 2021

View reviewed changes

pandas/core/groupby/groupby.py Show resolved Hide resolved

Dr-Irv added 3 commits September 8, 2021 21:52

Merge remote-tracking branch 'upstream/master' into pr43154

e141123

whatsnew 1.3.3, move tests, restore mypy

45f54d6

add back blank line

0012423

jreback approved these changes Sep 9, 2021

View reviewed changes

jreback and others added 3 commits September 9, 2021 07:38

Merge branch 'master' into issue_43108

a9a63e7

add FutureWarning. Avoid Int64Index

d46091e

Merge remote-tracking branch 'upstream/master' into issue_43108

a39e5ce

jreback approved these changes Sep 9, 2021

View reviewed changes

jreback merged commit 7d790cf into pandas-dev:master Sep 9, 2021

meeseeksmachine mentioned this pull request Sep 9, 2021

Backport PR #43154 on branch 1.3.x (Updating _resolve_numeric_only function of GroupBy) #43481

Merged

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Sep 9, 2021

Backport PR pandas-dev#43154: Updating _resolve_numeric_only function…

b0afb9e

… of GroupBy

jreback pushed a commit that referenced this pull request Sep 9, 2021

Backport PR #43154: Updating _resolve_numeric_only function of GroupBy (

ac09649

#43481) Co-authored-by: Prerana Chakraborty <40196782+kurchi1205@users.noreply.github.com>

simonjayhawkins mentioned this pull request Sep 10, 2021

REGR: FutureWarning issued and empty DataFrame returned where no numeric types to aggregate #43501

Closed

3 tasks

simonjayhawkins mentioned this pull request Sep 11, 2021

REGR: 1.3 invalid exclusion of nuisance columns with groupby aggregation #43380

Closed

3 tasks

rhshadrach mentioned this pull request Feb 19, 2022

DEPR: DataFrameGroupBy numeric_only defaulting to True #46072

Closed

		For SeriesGroupBy we want the default to be False (to match Series behavior).
		For DataFrameGroupBy we want it to be True (for backwards-compat).

Updating _resolve_numeric_only function of GroupBy #43154

Updating _resolve_numeric_only function of GroupBy #43154

Conversation

kurchi1205 commented Aug 21, 2021 • edited Loading

pep8speaks commented Aug 21, 2021 • edited Loading

Comment last updated at 2021-09-09 16:40:13 UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Aug 21, 2021

jbrockmendel commented Aug 22, 2021

kurchi1205 commented Aug 23, 2021

jbrockmendel commented Aug 23, 2021

kurchi1205 commented Aug 24, 2021

simonjayhawkins commented Aug 24, 2021

simonjayhawkins left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dr-Irv Sep 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dr-Irv commented Sep 9, 2021

Dr-Irv commented Sep 9, 2021

jreback commented Sep 9, 2021

Dr-Irv commented Sep 9, 2021

Dr-Irv commented Sep 9, 2021

Dr-Irv commented Sep 9, 2021

jreback left a comment

Choose a reason for hiding this comment

jreback commented Sep 9, 2021

Dr-Irv commented Sep 9, 2021

simonjayhawkins commented Sep 10, 2021

simonjayhawkins commented Sep 10, 2021

simonjayhawkins commented Sep 10, 2021

kurchi1205 commented Aug 21, 2021 •

edited

Loading

pep8speaks commented Aug 21, 2021 •

edited

Loading

Dr-Irv Sep 9, 2021 •

edited

Loading