Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor index-as-string groupby tests and fix spurious warning (Bug 17383) #17843

Merged
merged 5 commits into from
Oct 14, 2017

Conversation

jonmmease
Copy link
Contributor

Test case refactoring:

Jon M. Mease added 4 commits October 10, 2017 16:43
  - Extract to separate file (test_index_as_string.py)
  - Parameterize over test DataFrames
  - Add series test case
  - Update test_grouper_column_index_level_precedence to reproduce false warning problem as described in GH17383
  - Update test_grouper_column_index_level_precedence to verify when warning shouldn't be raised (Results in test failure due to GH17383)
@codecov
Copy link

codecov bot commented Oct 10, 2017

Codecov Report

Merging #17843 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17843      +/-   ##
==========================================
- Coverage   91.22%   91.22%   -0.01%     
==========================================
  Files         163      163              
  Lines       50014    50075      +61     
==========================================
+ Hits        45627    45679      +52     
- Misses       4387     4396       +9
Flag Coverage Δ
#multiple 89.03% <100%> (+0.01%) ⬆️
#single 40.32% <0%> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/core/groupby.py 91.98% <100%> (-0.02%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/compat/numpy/function.py 92.12% <0%> (-1.22%) ⬇️
pandas/core/indexing.py 92.82% <0%> (-0.19%) ⬇️
pandas/io/formats/format.py 95.94% <0%> (-0.13%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.12%) ⬇️
pandas/core/computation/align.py 97.89% <0%> (-0.05%) ⬇️
pandas/core/reshape/concat.py 97.57% <0%> (-0.04%) ⬇️
pandas/core/indexes/base.py 96.47% <0%> (-0.01%) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 727ea20...3945107. Read the comment docs.

with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
result = df_multi_both.groupby('inner').mean()

expected = df_multi_both.groupby([pd.Grouper(key='inner')]).mean()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pd.Grouper object in this expected expression shouldn't have been wrapped in a list. If it had not been, the spurious warning would have been raised in this test. This is corrected in the new test below.

result = frame.groupby('inner').mean()

with tm.assert_produces_warning(False):
expected = frame.groupby(pd.Grouper(key='inner')).mean()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the pd.Grouper object is no longer wrapped in a list and that we now assert that no warning is raised. This is the test case that would have failed without the fix in this PR.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments

@@ -2704,7 +2704,7 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True,

# a passed-in Grouper, directly convert
if isinstance(key, Grouper):
binner, grouper, obj = key._get_grouper(obj)
binner, grouper, obj = key._get_grouper(obj, validate=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that I had to add this flag to 'fix' this warning issue elsewhere, I don't really like it, but would require more refactoring to make this cleaner.



def build_df_multi():
idx = pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('a', 3),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should just be fixtures

return series_multi


class TestGroupByIndexAsString(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no real need to make this a class, that is really a leftover from nose, just make these functions (of course a class is good for grouping generally)

expected = frame.groupby(pd.Grouper(key='inner')).mean()

assert_frame_equal(result, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to have even another level of parameterization to make these shorter here (if that's possible)

@jonmmease
Copy link
Contributor Author

Thanks for the feedback @jreback. I think I've made all the changes you requested and I learned some things about pytest along the way.

@jreback jreback added this to the 0.21.0 milestone Oct 14, 2017
@jreback jreback merged commit e001500 into pandas-dev:master Oct 14, 2017
@jreback
Copy link
Contributor

jreback commented Oct 14, 2017

thank @jmmease nice patch! keep em coming!

ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 16, 2017
ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 16, 2017
* upstream/master: (76 commits)
  CategoricalDtype construction: actually use fastpath (pandas-dev#17891)
  DEPR: Deprecate tupleize_cols in to_csv (pandas-dev#17877)
  BUG: Fix wrong column selection in drop_duplicates when duplicate column names (pandas-dev#17879)
  DOC: Adding examples to update docstring (pandas-dev#16812) (pandas-dev#17859)
  TST: Skip if no openpyxl in test_excel (pandas-dev#17883)
  TST: Catch read_html slow test warning (pandas-dev#17874)
  flake8 cleanup (pandas-dev#17873)
  TST: remove moar warnings (pandas-dev#17872)
  ENH: tolerance now takes list-like argument for reindex and get_indexer. (pandas-dev#17367)
  ERR: Raise ValueError when week is passed in to_datetime format witho… (pandas-dev#17819)
  TST: remove some deprecation warnings (pandas-dev#17870)
  Refactor index-as-string groupby tests and fix spurious warning (Bug 17383) (pandas-dev#17843)
  BUG: merging with a boolean/int categorical column (pandas-dev#17841)
  DEPR: Deprecate read_csv arguments fully (pandas-dev#17865)
  BUG: to_json - prevent various segfault conditions (GH14256) (pandas-dev#17857)
  CLN: Use pandas.core.common for None checks (pandas-dev#17816)
  BUG: set tz on DTI from fixed format HDFStore (pandas-dev#17844)
  RLS: v0.21.0rc1
  Whatsnew cleanup (pandas-dev#17858)
  DEPR: Deprecate the convert parameter completely (pandas-dev#17831)
  ...
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Groupby with matching column and index name emits spurious warning
3 participants