-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor index-as-string groupby tests and fix spurious warning (Bug 17383) #17843
Conversation
- Extract to separate file (test_index_as_string.py) - Parameterize over test DataFrames - Add series test case - Update test_grouper_column_index_level_precedence to reproduce false warning problem as described in GH17383 - Update test_grouper_column_index_level_precedence to verify when warning shouldn't be raised (Results in test failure due to GH17383)
Codecov Report
@@ Coverage Diff @@
## master #17843 +/- ##
==========================================
- Coverage 91.22% 91.22% -0.01%
==========================================
Files 163 163
Lines 50014 50075 +61
==========================================
+ Hits 45627 45679 +52
- Misses 4387 4396 +9
Continue to review full report at Codecov.
|
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False): | ||
result = df_multi_both.groupby('inner').mean() | ||
|
||
expected = df_multi_both.groupby([pd.Grouper(key='inner')]).mean() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pd.Grouper
object in this expected
expression shouldn't have been wrapped in a list. If it had not been, the spurious warning would have been raised in this test. This is corrected in the new test below.
result = frame.groupby('inner').mean() | ||
|
||
with tm.assert_produces_warning(False): | ||
expected = frame.groupby(pd.Grouper(key='inner')).mean() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the pd.Grouper
object is no longer wrapped in a list and that we now assert that no warning is raised. This is the test case that would have failed without the fix in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments
@@ -2704,7 +2704,7 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True, | |||
|
|||
# a passed-in Grouper, directly convert | |||
if isinstance(key, Grouper): | |||
binner, grouper, obj = key._get_grouper(obj) | |||
binner, grouper, obj = key._get_grouper(obj, validate=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that I had to add this flag to 'fix' this warning issue elsewhere, I don't really like it, but would require more refactoring to make this cleaner.
|
||
|
||
def build_df_multi(): | ||
idx = pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('a', 3), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should just be fixtures
return series_multi | ||
|
||
|
||
class TestGroupByIndexAsString(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no real need to make this a class, that is really a leftover from nose, just make these functions (of course a class is good for grouping generally)
expected = frame.groupby(pd.Grouper(key='inner')).mean() | ||
|
||
assert_frame_equal(result, expected) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy to have even another level of parameterization to make these shorter here (if that's possible)
Thanks for the feedback @jreback. I think I've made all the changes you requested and I learned some things about pytest along the way. |
thank @jmmease nice patch! keep em coming! |
* upstream/master: (76 commits) CategoricalDtype construction: actually use fastpath (pandas-dev#17891) DEPR: Deprecate tupleize_cols in to_csv (pandas-dev#17877) BUG: Fix wrong column selection in drop_duplicates when duplicate column names (pandas-dev#17879) DOC: Adding examples to update docstring (pandas-dev#16812) (pandas-dev#17859) TST: Skip if no openpyxl in test_excel (pandas-dev#17883) TST: Catch read_html slow test warning (pandas-dev#17874) flake8 cleanup (pandas-dev#17873) TST: remove moar warnings (pandas-dev#17872) ENH: tolerance now takes list-like argument for reindex and get_indexer. (pandas-dev#17367) ERR: Raise ValueError when week is passed in to_datetime format witho… (pandas-dev#17819) TST: remove some deprecation warnings (pandas-dev#17870) Refactor index-as-string groupby tests and fix spurious warning (Bug 17383) (pandas-dev#17843) BUG: merging with a boolean/int categorical column (pandas-dev#17841) DEPR: Deprecate read_csv arguments fully (pandas-dev#17865) BUG: to_json - prevent various segfault conditions (GH14256) (pandas-dev#17857) CLN: Use pandas.core.common for None checks (pandas-dev#17816) BUG: set tz on DTI from fixed format HDFStore (pandas-dev#17844) RLS: v0.21.0rc1 Whatsnew cleanup (pandas-dev#17858) DEPR: Deprecate the convert parameter completely (pandas-dev#17831) ...
git diff upstream/master -u -- "*.py" | flake8 --diff
Test case refactoring:
test_groupby.py
and into a newtest_index_as_string.py
fileSeries
test_grouper_column_index_level_precedence
to reproduce false warning problem as described in Groupby with matching column and index name emits spurious warning #17383test_grouper_column_index_level_precedence
to verify when warnings should NOT be raised (Results in a test failure due to Groupby with matching column and index name emits spurious warning #17383 without this fix)