Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix Series.nlargest for integer boundary values #21432

Merged
merged 4 commits into from
Jun 15, 2018

Conversation

jschendel
Copy link
Member

@jschendel jschendel commented Jun 11, 2018

Also added some similar tests for float and datetimelike dtypes to ensure that the behavior is as desired.

@jschendel jschendel added Bug Dtype Conversions Unexpected or buggy dtype conversions Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jun 11, 2018
@jschendel jschendel added this to the 0.23.2 milestone Jun 11, 2018
result = getattr(s, method)(3)
expected_idxr = [0, 1, 2] if method == 'nsmallest' else [3, 2, 1]
expected = s.loc[expected_idxr]
tm.assert_series_equal(result, expected)

This comment was marked as resolved.

This comment was marked as resolved.

@gfyoung
Copy link
Member

gfyoung commented Jun 11, 2018

No whatsnew entry for the time being since this was tagged as 0.23.2, and no whatsnew has been created for that release yet.

@jschendel : Not a problem. Just wait for #21433 and rebase.

@gfyoung
Copy link
Member

gfyoung commented Jun 12, 2018

#21433 has been merged. Feel free to rebase to add your whatsnew.

@jschendel jschendel force-pushed the nlargest-integer-boundary branch from c52bcf7 to 73595b3 Compare June 12, 2018 00:50
@codecov
Copy link

codecov bot commented Jun 12, 2018

Codecov Report

Merging #21432 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21432      +/-   ##
==========================================
+ Coverage   91.89%    91.9%   +<.01%     
==========================================
  Files         153      153              
  Lines       49604    49606       +2     
==========================================
+ Hits        45584    45588       +4     
+ Misses       4020     4018       -2
Flag Coverage Δ
#multiple 90.3% <100%> (ø) ⬆️
#single 41.88% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/algorithms.py 94.83% <100%> (+0.01%) ⬆️
pandas/util/testing.py 84.81% <0%> (+0.2%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf1c3dc...fb9d532. Read the comment docs.

Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

cc @jreback

@@ -1946,6 +1946,10 @@ def test_mode_sortwarning(self):

class TestNLargestNSmallest(object):

@pytest.fixture(params=['nlargest', 'nsmallest'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would not object to moving this fixture to conftest (at a high level) and can replace usage in frame/test_analytics.py as well.

@@ -2028,6 +2032,40 @@ def test_n(self, n):
expected = s.sort_values().head(n)
assert_series_equal(result, expected)

def _check_nselect_boundary(self, vals, dtype, method):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name: assert_check_nselect_boundary

can be a module level function

expected = s.loc[expected_idxr]
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize('dtype', [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a fixture for these types (if not, can you create one in conftest)

Copy link
Member

@gfyoung gfyoung Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback : @jschendel brought up a good point in #21456 (comment) that we might have some intersection with this PR and #21456. How best to work that out?

Copy link
Member Author

@jschendel jschendel Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I've modified the fixtures I wrote to exactly match @gfyoung's. Were basically identical anyways. I've got all my review related changes ready, so can push whenever.

vals = [min_val, min_val + 1, max_val - 1, max_val]
self._check_nselect_boundary(vals, dtype, method)

@pytest.mark.parametrize('dtype', ['float16', 'float32', 'float64'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls create a fixture (drop float16 we barely support this)

@jschendel jschendel force-pushed the nlargest-integer-boundary branch from 73595b3 to fb9d532 Compare June 14, 2018 23:47
@jschendel
Copy link
Member Author

jschendel commented Jun 14, 2018

Addressed review comments; copied @gfyoung's dtype fixtures where appropriate.

Did some additional cleaning of frame/test_analytics.py to use one of the new fixtures I created, and to use multiple instances of parametrize instead of jamming everything into a single one with product.

@jreback jreback merged commit ec5956e into pandas-dev:master Jun 15, 2018
@jreback
Copy link
Contributor

jreback commented Jun 15, 2018

thanks @jschendel

@jreback
Copy link
Contributor

jreback commented Jun 15, 2018

can you create an issue to use these fixtures project wide?

@jschendel jschendel deleted the nlargest-integer-boundary branch June 15, 2018 19:03
@jschendel
Copy link
Member Author

can you create an issue to use these fixtures project wide?

Created #21500 for this

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018
jorisvandenbossche pushed a commit that referenced this pull request Jun 29, 2018
jorisvandenbossche pushed a commit that referenced this pull request Jul 2, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: .nlargest with unsigned integers
4 participants