BUG: Fix unexpected sort in groupby #17621

Licht-T · 2017-09-22T07:41:49Z

closes BUG: df.groupby(sort=False) sorts multi-index-frames #17537
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2017-09-22T08:57:45Z

Codecov Report

Merging #17621 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17621      +/-   ##
==========================================
- Coverage    91.2%   91.16%   -0.04%     
==========================================
  Files         163      163              
  Lines       49637    49643       +6     
==========================================
- Hits        45269    45259      -10     
- Misses       4368     4384      +16

Flag	Coverage Δ
#multiple	`88.95% <100%> (-0.02%)`	⬇️
#single	`40.18% <0%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/groupby.py	`92.24% <100%> (+0.02%)`	⬆️
pandas/core/generic.py	`91.98% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/indexes/multi.py	`96.39% <0%> (-0.51%)`	⬇️
pandas/core/indexes/category.py	`98.26% <0%> (-0.29%)`	⬇️
pandas/core/frame.py	`97.77% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8276a42...1e13713. Read the comment docs.

codecov · 2017-09-22T08:57:45Z

Codecov Report

Merging #17621 into master will decrease coverage by 0.05%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17621      +/-   ##
==========================================
- Coverage   91.27%   91.21%   -0.06%     
==========================================
  Files         163      163              
  Lines       49765    49770       +5     
==========================================
- Hits        45421    45399      -22     
- Misses       4344     4371      +27

Flag	Coverage Δ
#multiple	`89.01% <100%> (-0.04%)`	⬇️
#single	`40.32% <0%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/generic.py	`92.07% <100%> (ø)`	⬆️
pandas/core/groupby.py	`92.25% <100%> (+0.01%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/plotting/_converter.py	`63.38% <0%> (-1.82%)`	⬇️
pandas/core/indexes/multi.py	`96.39% <0%> (-0.51%)`	⬇️
pandas/core/indexes/category.py	`97.46% <0%> (-0.29%)`	⬇️
pandas/core/frame.py	`97.73% <0%> (-0.1%)`	⬇️
pandas/core/indexes/base.py	`96.34% <0%> (+0.05%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad7d051...9b6a3da. Read the comment docs.

jreback · 2017-09-22T13:02:05Z

pandas/core/groupby.py

@@ -2613,6 +2613,13 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True,

            level = None
            key = group_axis
+        elif key is None:


hate adding logic here, this function is already impenetrable, can you incorporate this to existing?

jreback · 2017-09-22T13:02:40Z

pandas/tests/groupby/test_groupby.py

        assert_frame_equal(result0, expected0)
        assert_frame_equal(result1, expected1)

        # axis=1

-        result0 = frame.T.groupby(level=0, axis=1).sum()
-        result1 = frame.T.groupby(level=1, axis=1).sum()


for a couple of these that you changed can you also add the sort=True case (maybe parametrize on sort=)?

jreback

pls add a whatsnew note

pep8speaks · 2017-09-22T15:11:03Z

Hello @Licht-T! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 29, 2017 at 11:42 Hours UTC

Licht-T · 2017-09-22T15:41:18Z

@jreback Thanks for your review.

Changed the location of the solution to improve readability
Parameterized sort parameter of groupby in tests

jreback · 2017-09-22T16:16:43Z

pandas/core/groupby.py

@@ -2626,6 +2626,14 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True,
    elif isinstance(key, BaseGrouper):
        return key, [], obj

+    if key is None and isinstance(group_axis, MultiIndex):


can you simplify this. e.g. maybe put this coercion higher (before the giant if/then)

jreback · 2017-09-22T16:16:58Z

pandas/tests/groupby/test_groupby.py

@@ -1791,18 +1791,19 @@ def aggfun(ser):
        agged2 = df.groupby(keys).aggregate(aggfun)
        assert len(agged2.columns) + 1 == len(df.columns)

-    def test_groupby_level(self):
+    @pytest.mark.parametrize('sort', [True, False])
+    def test_groupby_level(self, sort):
        frame = self.mframe


add the issue number here as well as a comment

Licht-T · 2017-09-22T18:13:11Z

@jreback Thank you for comments. Fixed.

jreback · 2017-09-24T13:24:21Z

pandas/core/groupby.py

@@ -2586,6 +2586,15 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True,
    """
    group_axis = obj._get_axis(axis)

+    if key is None and level is not None and \
+       isinstance(group_axis, MultiIndex):


so maybe move this down a bit (under level is not None). I don't want this to be a bespoke condition. I think you can remove the isinstance check of MultiIndex.

jreback · 2017-09-24T13:25:10Z

pandas/core/groupby.py

+            level = level[0]
+
+        if is_scalar(level):
+            key = group_axis.get_level_values(level)


put a comment on what is going on here. maybe we can incorporate this below as well.

I am trying to remove as many special cases as possible.

TomAugspurger

@Licht-T this looks pretty close. Could you fixup the merge conflict and take a look at @jreback's comments in the next couple days?

TomAugspurger · 2017-09-25T21:02:01Z

doc/source/whatsnew/v0.21.0.txt

@@ -538,6 +538,7 @@ Groupby/Resample/Rolling
 - Bug in ``Series.resample(...).apply()`` where an empty ``Series`` modified the source index and did not return the name of a ``Series`` (:issue:`14313`)
 - Bug in ``.rolling(...).apply(...)`` with a ``DataFrame`` with a ``DatetimeIndex``, a ``window`` of a timedelta-convertible and ``min_periods >= 1` (:issue:`15305`)
 - Bug in ``DataFrame.groupby`` where index and column keys were not recognized correctly when the number of keys equaled the number of elements on the groupby axis (:issue:`16859`)
+- Bug in ``DataFrame.groupby`` where the single level selection from ``MultiIndex`` occurs unexpected index sorting (:issue:`17537`)


"occurs" -> "incurs"? Or maybe "causes"?

Licht-T · 2017-09-26T13:02:35Z

@TomAugspurger Okay. I'll do that.

jreback · 2017-09-26T13:17:08Z

doc/source/whatsnew/v0.21.0.txt

@@ -538,6 +538,7 @@ Groupby/Resample/Rolling
 - Bug in ``Series.resample(...).apply()`` where an empty ``Series`` modified the source index and did not return the name of a ``Series`` (:issue:`14313`)
 - Bug in ``.rolling(...).apply(...)`` with a ``DataFrame`` with a ``DatetimeIndex``, a ``window`` of a timedelta-convertible and ``min_periods >= 1` (:issue:`15305`)
 - Bug in ``DataFrame.groupby`` where index and column keys were not recognized correctly when the number of keys equaled the number of elements on the groupby axis (:issue:`16859`)
+- Bug in ``DataFrame.groupby`` where the single level selection from ``MultiIndex`` occurs unexpected index sorting (:issue:`17537`)


where a single level selection from a MultiIndex unexpectedly sorts.

jreback · 2017-09-27T11:44:44Z

pandas/core/groupby.py

    # axis of the object
    if level is not None:
-        if not isinstance(group_axis, MultiIndex):
+        # TODO: These two conditions are almost same.


ok for now. can you come back in a future PR and see what we can do with all the conditions in this section. getting pretty unweildy (and document as much as possible).

@jreback Okay. I'll do that. These are too complicated to do refactoring in this PR, I think.

jreback · 2017-09-27T11:45:51Z

pandas/core/groupby.py

+        # TODO: These two conditions are almost same.
+        # We should combine two.
+        if isinstance(group_axis, MultiIndex):
+            if is_list_like(level) and len(level) == 1:


actually this condition I think you can pull out of the MultiIndex check here (as the else is the same condition)

@jreback I am aware of this, but it seems that there are some processes only for non-MultiIndex in else. We have to consider carefully whether these are applicable for MultiIndex.
https://github.com/pandas-dev/pandas/pull/17621/files/e4cdd0726e685b0216056ba224ed363bf1e836f9#diff-720d374f1a709d0075a1f0a02445cd65R2618

When these are applicable, we also have to check if there is no side effect to subsequent processes.

jreback · 2017-09-28T14:40:26Z

can you rebase

…iIndex

Licht-T · 2017-09-28T16:58:54Z

@jreback Rebased.

jreback · 2017-09-29T10:29:27Z

@Licht-T can you rebase and push once again, want to get all green here.

…ew note

Licht-T · 2017-09-29T14:59:26Z

@jreback Now all green!

jreback · 2017-10-01T14:53:48Z

thanks @Licht-T

* 'master' of github.com:pandas-dev/pandas: (188 commits) Separate out _convert_datetime_to_tsobject (pandas-dev#17715) DOC: remove whatsnew note for xref pandas-dev#17131 BUG: Regression in .loc accepting a boolean Index as an indexer (pandas-dev#17738) DEPR: Deprecate cdate_range and merge into bdate_range (pandas-dev#17691) CLN: replace %s syntax with .format in pandas.core: categorical, common, config, config_init (pandas-dev#17735) Fixed the memory usage explanation of categorical in gotchas from O(nm) to O(n+m) (pandas-dev#17736) TST: add backward compat for offset testing for pickles (pandas-dev#17733) remove unused time conversion funcs (pandas-dev#17711) DEPR: Deprecate convert parameter in take (pandas-dev#17352) BUG:Time Grouper bug fix when applied for list groupers (pandas-dev#17587) BUG: Fix some PeriodIndex resampling issues (pandas-dev#16153) BUG: Fix unexpected sort in groupby (pandas-dev#17621) DOC: Fixed typo in documentation for 'pandas.DataFrame.replace' (pandas-dev#17731) BUG: Fix series rename called with str altering name rather index (GH17407) (pandas-dev#17654) DOC: Add examples for MultiIndex.get_locs + cleanups (pandas-dev#17675) Doc improvements for IntervalIndex and Interval (pandas-dev#17714) BUG: DataFrame sort_values and multiple "by" columns fails to order NaT correctly Last of the timezones funcs (pandas-dev#17669) Add missing file to _pyxfiles, delete commented-out (pandas-dev#17712) update imports of DateParseError, remove unused imports from tslib (pandas-dev#17713) ...

Licht-T changed the title ~~Fix unexpected sort groupby~~ BUG: Fix unexpected sort in groupby Sep 22, 2017

jreback reviewed Sep 22, 2017

View reviewed changes

jreback requested changes Sep 22, 2017

View reviewed changes

jreback added Bug Groupby MultiIndex Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Sep 22, 2017

Licht-T force-pushed the fix-unexpected-sort-groupby branch 2 times, most recently from 576a6cc to 5567ac1 Compare September 22, 2017 15:24

jreback reviewed Sep 22, 2017

View reviewed changes

jreback requested changes Sep 24, 2017

View reviewed changes

TomAugspurger reviewed Sep 25, 2017

View reviewed changes

TomAugspurger added this to the 0.21.0 milestone Sep 25, 2017

jreback reviewed Sep 26, 2017

View reviewed changes

Licht-T force-pushed the fix-unexpected-sort-groupby branch from 24409e9 to e4cdd07 Compare September 27, 2017 11:36

jreback reviewed Sep 27, 2017

View reviewed changes

Licht-T added 3 commits September 29, 2017 01:12

BUG: Fix unexpected sort behavior when single level groupby from Mult…

0e7bdb3

…iIndex

BUG: Fix unexpected sort behavior on aggregation

c3a1701

TST: Fix existing tests for groupby

7b23e65

Licht-T force-pushed the fix-unexpected-sort-groupby branch from 455a60b to 9962d61 Compare September 28, 2017 16:33

DOC: Add the description for fix unexpected sort in groupby in whatsn…

9b6a3da

…ew note

Licht-T force-pushed the fix-unexpected-sort-groupby branch from 9962d61 to 9b6a3da Compare September 29, 2017 11:41

jreback approved these changes Oct 1, 2017

View reviewed changes

jreback merged commit fd336fb into pandas-dev:master Oct 1, 2017

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

BUG: Fix unexpected sort in groupby (pandas-dev#17621)

8f32f68

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

BUG: Fix unexpected sort in groupby (pandas-dev#17621)

065b848

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix unexpected sort in groupby #17621

BUG: Fix unexpected sort in groupby #17621

Licht-T commented Sep 22, 2017 •

edited

Loading

codecov bot commented Sep 22, 2017

codecov bot commented Sep 22, 2017 •

edited

Loading

jreback Sep 22, 2017

jreback Sep 22, 2017

jreback left a comment

pep8speaks commented Sep 22, 2017 •

edited

Loading

Licht-T commented Sep 22, 2017

jreback Sep 22, 2017

jreback Sep 22, 2017

Licht-T commented Sep 22, 2017

jreback Sep 24, 2017

jreback Sep 24, 2017

TomAugspurger left a comment

TomAugspurger Sep 25, 2017

Licht-T commented Sep 26, 2017

jreback Sep 26, 2017

jreback Sep 27, 2017

Licht-T Sep 27, 2017

jreback Sep 27, 2017

Licht-T Sep 27, 2017 •

edited

Loading

Licht-T Sep 27, 2017

jreback commented Sep 28, 2017

Licht-T commented Sep 28, 2017

jreback commented Sep 29, 2017

Licht-T commented Sep 29, 2017

jreback commented Oct 1, 2017

BUG: Fix unexpected sort in groupby #17621

BUG: Fix unexpected sort in groupby #17621

Conversation

Licht-T commented Sep 22, 2017 • edited Loading

codecov bot commented Sep 22, 2017

Codecov Report

codecov bot commented Sep 22, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

pep8speaks commented Sep 22, 2017 • edited Loading

Comment last updated on September 29, 2017 at 11:42 Hours UTC

Licht-T commented Sep 22, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Licht-T commented Sep 22, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Licht-T commented Sep 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Licht-T Sep 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 28, 2017

Licht-T commented Sep 28, 2017

jreback commented Sep 29, 2017

Licht-T commented Sep 29, 2017

jreback commented Oct 1, 2017

Licht-T commented Sep 22, 2017 •

edited

Loading

codecov bot commented Sep 22, 2017 •

edited

Loading

pep8speaks commented Sep 22, 2017 •

edited

Loading

Licht-T Sep 27, 2017 •

edited

Loading