-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: groupby with as_index=False shouldn't modify grouping columns #34012
Conversation
cf8c847
to
4fd773c
Compare
doc/source/whatsnew/v1.1.0.rst
Outdated
@@ -803,6 +803,7 @@ Groupby/resample/rolling | |||
- Bug in :meth:`DataFrame.groupby` where a ``ValueError`` would be raised when grouping by a categorical column with read-only categories and ``sort=False`` (:issue:`33410`) | |||
- Bug in :meth:`GroupBy.first` and :meth:`GroupBy.last` where None is not preserved in object dtype (:issue:`32800`) | |||
- Bug in :meth:`Rolling.min` and :meth:`Rolling.max`: Growing memory usage after multiple calls when using a fixed window (:issue:`30726`) | |||
- Bug in :meth:`DataFrame.groupby` when using ``as_index=False`` would modify the grouping column when used with ``idxmax``, ``idxmin``, ``mad``, ``nunique``, and ``skew`` (:issue:`21090`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make an api breaking section that shows a before / after (pick a function like nunique) to show what has changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don’t need this note as u have the one above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - missed this. Thanks for catching it.
@jreback Changes made, checks are green. |
pandas/core/groupby/groupby.py
Outdated
def _python_apply_general(self, f): | ||
keys, values, mutated = self.grouper.apply(f, self._selected_obj, self.axis) | ||
def _python_apply_general(self, f, obj): | ||
keys, values, mutated = self.grouper.apply(f, obj, self.axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u add a doc string and types here (at least for the added args)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While adding the docstring, I realized that the name "obj" here is very confusing as it is not the same as self.obj. I renamed to data, this is the name used in grouper.apply.
cf93e12
to
5eb636d
Compare
@jreback changes made and green. |
pandas/core/groupby/generic.py
Outdated
@@ -1898,6 +1908,9 @@ def groupby_series(obj, col=None): | |||
|
|||
if not self.as_index: | |||
results.index = ibase.default_index(len(results)) | |||
if results.ndim == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this hit anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is not. obj_with_exclusions is always a frame, so I've removed the series path in this function.
doc/source/whatsnew/v1.1.0.rst
Outdated
|
||
Using :meth:`DataFrame.groupby` with ``as_index=False`` and the function ``idxmax``, ``idxmin``, ``mad``, ``nunique``, or ``skew`` would modify the grouping column. Now, the grouping column remains unchanged. (:issue:`21090`) | ||
|
||
.. ipython:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also changes nunique for as_index=True, can you put that example here as well (first)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do this in the same note as its very confusing to read this otherwise.
@jreback Changes made and tests pass. It seemed best to me to add a new api breaking section for nunique when as_index=True. While the code changes to nunique were the same for both issues, from an api standpoint I think they are different cases. If you'd like me to combine into one section though, I can do that. |
5ede7e5
to
24ad221
Compare
c6bd67d
to
84e2ce5
Compare
84e2ce5
to
3a0e80b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good a few more comments on the doc-note.
doc/source/whatsnew/v1.1.0.rst
Outdated
|
||
Using :meth:`DataFrame.groupby` with ``as_index=False`` and the function ``idxmax``, ``idxmin``, ``mad``, ``nunique``, or ``skew`` would modify the grouping column. Now, the grouping column remains unchanged. (:issue:`21090`) | ||
|
||
.. ipython:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do this in the same note as its very confusing to read this otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment otherwise lgtm.
thanks @rhshadrach fixing groupby bugs sometimes are non-trivial in time, thanks for sticking with it and keep em coming! |
agg()
will return groups as index #32579black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
This makes any groupby function that uses _GroupBy._make_wrapper to act on self._obj_with_exclusions rather than self._selected_obj, in parallel with the cython paths. Also similar to the cython paths, the grouping columns are added back onto the result in _wrap_applied_output. Similar remarks apply to nunique, which does not go through the _make_wrapper.
After finishing this, I found PR #29131. This PR does not change the behavior of calling apply() itself, only the previous code paths that would internally go through apply. Also, the change made there does not have any impact when as_index is False.