Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating _resolve_numeric_only function of GroupBy #43154

Merged
merged 38 commits into from
Sep 9, 2021
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
6b0f5e7
Updating _resolve_numeric_only function of GroupBy
kurchi1205 Aug 21, 2021
116534f
Update groupby.py
kurchi1205 Aug 21, 2021
7b5ecb4
Update test_aggregate.py
kurchi1205 Aug 21, 2021
7faf1fc
Update groupby.py
kurchi1205 Aug 21, 2021
d773e7a
Update groupby.py
kurchi1205 Aug 21, 2021
5b4e799
Update test_aggregate.py
kurchi1205 Aug 21, 2021
3d2e78e
Update groupby.py
kurchi1205 Aug 21, 2021
5836f91
Update groupby.py
kurchi1205 Aug 21, 2021
1eb0e25
Update groupby.py
kurchi1205 Aug 21, 2021
a0391e6
Update test_aggregate.py
kurchi1205 Aug 21, 2021
2a1835a
Update test_aggregate.py
kurchi1205 Aug 21, 2021
66ecb96
Update test_aggregate.py
kurchi1205 Aug 21, 2021
c80fa9e
Update groupby.py
kurchi1205 Aug 22, 2021
95b6533
Update groupby.py
kurchi1205 Aug 22, 2021
fa8a8ca
Update test_aggregate.py
kurchi1205 Aug 22, 2021
e3f6767
Update test_aggregate.py
kurchi1205 Aug 22, 2021
ed668e6
Update groupby.py
kurchi1205 Aug 22, 2021
f30855a
Update groupby.py
kurchi1205 Aug 22, 2021
65261b4
Update groupby.py
kurchi1205 Aug 22, 2021
2477aa3
Update groupby.py
kurchi1205 Aug 22, 2021
da24f29
Update groupby.py
kurchi1205 Aug 22, 2021
4075353
Update groupby.py
kurchi1205 Aug 24, 2021
463a7c9
Update test_aggregate.py
kurchi1205 Aug 24, 2021
dd562c5
Update frame.py
kurchi1205 Aug 24, 2021
f400174
Making the necessary to _resolve_numeric_only and adding testcase
kurchi1205 Aug 29, 2021
56f28d7
Checking for empty DataFrame
kurchi1205 Aug 29, 2021
4fd254b
Commiting
kurchi1205 Aug 30, 2021
a3b09e6
Adding test cases
kurchi1205 Aug 30, 2021
f7e2d59
No changes
kurchi1205 Aug 30, 2021
270549a
Changes in tests and warning
kurchi1205 Sep 1, 2021
17808bf
Solving the errors
kurchi1205 Sep 1, 2021
08b2429
Solving the errors
kurchi1205 Sep 1, 2021
e141123
Merge remote-tracking branch 'upstream/master' into pr43154
Dr-Irv Sep 9, 2021
45f54d6
whatsnew 1.3.3, move tests, restore mypy
Dr-Irv Sep 9, 2021
0012423
add back blank line
Dr-Irv Sep 9, 2021
a9a63e7
Merge branch 'master' into issue_43108
jreback Sep 9, 2021
d46091e
add FutureWarning. Avoid Int64Index
Dr-Irv Sep 9, 2021
a39e5ce
Merge remote-tracking branch 'upstream/master' into issue_43108
Dr-Irv Sep 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1102,14 +1102,11 @@ def _wrap_applied_output(self, data, keys, values, not_indexed_same: bool = Fals
def _resolve_numeric_only(self, numeric_only: bool | lib.NoDefault) -> bool:
"""
Determine subclass-specific default value for 'numeric_only'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you restore this whitespace. docstrings should start with a one-liner summary distinct from the expanded description.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I"ve done this in 45f54d6

For SeriesGroupBy we want the default to be False (to match Series behavior).
For DataFrameGroupBy we want it to be True (for backwards-compat).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore this one too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I"ve done this in 45f54d6

Parameters
----------
numeric_only : bool or lib.no_default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I"ve done this in 45f54d6

Returns
-------
bool
Expand All @@ -1120,12 +1117,15 @@ def _resolve_numeric_only(self, numeric_only: bool | lib.NoDefault) -> bool:
if self.obj.ndim == 2:
# i.e. DataFrameGroupBy
numeric_only = True
obj = self._obj_with_exclusions
jreback marked this conversation as resolved.
Show resolved Hide resolved
check = obj._get_numeric_data()
if len(obj.columns) and not len(check.columns) and not obj.empty:
warnings.warn("... Explicitly pass numeric_only ...")
rhshadrach marked this conversation as resolved.
Show resolved Hide resolved
numeric_only = False
jreback marked this conversation as resolved.
Show resolved Hide resolved

else:
numeric_only = False

# error: Incompatible return value type (got "Union[bool, NoDefault]",
# expected "bool")
return numeric_only # type: ignore[return-value]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you restore this too. The mypy error needs to be ignored as mypy does not know that lib.no_default is a singleton.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kurchi1205 can you add a release note.

I assume the release note should be added in v1.4.0.rst

Copy link
Contributor

@Dr-Irv Dr-Irv Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you restore this too. The mypy error needs to be ignored as mypy does not know that lib.no_default is a singleton.

I"ve done this in 45f54d6

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kurchi1205 can you add a release note.

I assume the release note should be added in v1.4.0.rst

We discussed at pandas dev meeting today, and put this in v1.3.3.rst

return numeric_only

# -----------------------------------------------------------------
# numba
Expand Down
65 changes: 65 additions & 0 deletions pandas/tests/groupby/aggregate/test_aggregate.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@
from pandas import (
DataFrame,
Index,
Int64Index,
MultiIndex,
Series,
Timedelta,
concat,
)
import pandas._testing as tm
Expand Down Expand Up @@ -115,6 +117,69 @@ def test_groupby_aggregation_mixed_dtype():
tm.assert_frame_equal(result, expected)


def test_groupby_aggregation_non_numeric_dtype():
rhshadrach marked this conversation as resolved.
Show resolved Hide resolved
# GH #43108
df = DataFrame(
[["M", [1]], ["M", [1]], ["W", [10]], ["W", [20]]], columns=["MW", "v"]
)

expected = DataFrame(
{
"v": [[1, 1], [10, 20]],
},
index=Index(["M", "W"], dtype="object", name="MW"),
)

with tm.assert_produces_warning(UserWarning):
g = df.groupby(by=["MW"])
rhshadrach marked this conversation as resolved.
Show resolved Hide resolved
result = g.sum()
tm.assert_frame_equal(result, expected)
rhshadrach marked this conversation as resolved.
Show resolved Hide resolved


def test_groupby_aggregation_multi_non_numeric_dtype():
# GH #42395
df = DataFrame(
{
"x": [1, 0, 1, 1, 0],
"y": [Timedelta(i, "days") for i in range(1, 6)],
"z": [Timedelta(i * 10, "days") for i in range(1, 6)],
}
)

expected = DataFrame(
{
"y": [Timedelta(i, "days") for i in range(7, 9)],
"z": [Timedelta(i * 10, "days") for i in range(7, 9)],
},
index=Int64Index([0, 1], dtype="int64", name="x"),
)

with tm.assert_produces_warning(UserWarning):
g = df.groupby(by=["x"])
result = g.sum()
tm.assert_frame_equal(result, expected)
rhshadrach marked this conversation as resolved.
Show resolved Hide resolved


def test_groupby_aggregation_numeric_with_non_numeric_dtype():
# GH #43108
df = DataFrame(
{
"x": [1, 0, 1, 1, 0],
"y": [Timedelta(i, "days") for i in range(1, 6)],
"z": [i for i in range(1, 6)],
}
)

expected = DataFrame(
{"z": [7, 8]},
index=Int64Index([0, 1], dtype="int64", name="x"),
)

g = df.groupby(by=["x"])
result = g.sum()
tm.assert_frame_equal(result, expected)


def test_groupby_aggregation_multi_level_column():
# GH 29772
lst = [
Expand Down