Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: make test_astype_categorical_to_other deterministic #26244

Merged
merged 1 commit into from
Apr 30, 2019

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Apr 30, 2019

The use of an unseeded random number generator means that this test can
occasionally fail for no particular reason.

In particular, I saw the following error:

=================================== FAILURES ===================================
______________ TestSeriesDtypes.test_astype_categorical_to_other _______________

self = <pandas.tests.series.test_dtypes.TestSeriesDtypes object at 0x7f223951b750>

    def test_astype_categorical_to_other(self):

        df = DataFrame({'value': np.random.randint(0, 10000, 100)})
        labels = ["{0} - {1}".format(i, i + 499) for i in range(0, 10000, 500)]
        cat_labels = Categorical(labels, labels)

        df = df.sort_values(by=['value'], ascending=True)
        df['value_group'] = pd.cut(df.value, range(0, 10500, 500),
                                   right=False, labels=cat_labels)

        s = df['value_group']
        expected = s
        tm.assert_series_equal(s.astype('category'), expected)
        tm.assert_series_equal(s.astype(CategoricalDtype()), expected)
        msg = (r"could not convert string to float: '(0 - 499|9500 - 9999)'|"
               r"invalid literal for float\(\): (0 - 499|9500 - 9999)")
        with pytest.raises(ValueError, match=msg):
>           s.astype('float64')
E           AssertionError: Pattern 'could not convert string to float: '(0 - 499|9500 - 9999)'|invalid literal for float\(\): (0 - 499|9500 - 9999)' not found in 'invalid literal for float(): 9000 - 9499'

By setting the random number seed, this should no longer be able to happen.

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

The use of an unseeded random number generator means that this test can
occasionally fail for no particular reason.

In particular, I saw the following error:

=================================== FAILURES ===================================
______________ TestSeriesDtypes.test_astype_categorical_to_other _______________

self = <pandas.tests.series.test_dtypes.TestSeriesDtypes object at 0x7f223951b750>

    def test_astype_categorical_to_other(self):

        df = DataFrame({'value': np.random.randint(0, 10000, 100)})
        labels = ["{0} - {1}".format(i, i + 499) for i in range(0, 10000, 500)]
        cat_labels = Categorical(labels, labels)

        df = df.sort_values(by=['value'], ascending=True)
        df['value_group'] = pd.cut(df.value, range(0, 10500, 500),
                                   right=False, labels=cat_labels)

        s = df['value_group']
        expected = s
        tm.assert_series_equal(s.astype('category'), expected)
        tm.assert_series_equal(s.astype(CategoricalDtype()), expected)
        msg = (r"could not convert string to float: '(0 - 499|9500 - 9999)'|"
               r"invalid literal for float\(\): (0 - 499|9500 - 9999)")
        with pytest.raises(ValueError, match=msg):
>           s.astype('float64')
E           AssertionError: Pattern 'could not convert string to float: '(0 - 499|9500 - 9999)'|invalid literal for float\(\): (0 - 499|9500 - 9999)' not found in 'invalid literal for float(): 9000 - 9499'

By setting the random number seed, this should no longer be able to happen.
@codecov
Copy link

codecov bot commented Apr 30, 2019

Codecov Report

Merging #26244 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26244      +/-   ##
==========================================
- Coverage   91.97%   91.96%   -0.01%     
==========================================
  Files         175      175              
  Lines       52368    52368              
==========================================
- Hits        48164    48160       -4     
- Misses       4204     4208       +4
Flag Coverage Δ
#multiple 90.52% <ø> (ø) ⬆️
#single 40.69% <ø> (-0.15%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9feb3ad...a93b15e. Read the comment docs.

1 similar comment
@codecov
Copy link

codecov bot commented Apr 30, 2019

Codecov Report

Merging #26244 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26244      +/-   ##
==========================================
- Coverage   91.97%   91.96%   -0.01%     
==========================================
  Files         175      175              
  Lines       52368    52368              
==========================================
- Hits        48164    48160       -4     
- Misses       4204     4208       +4
Flag Coverage Δ
#multiple 90.52% <ø> (ø) ⬆️
#single 40.69% <ø> (-0.15%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9feb3ad...a93b15e. Read the comment docs.

@WillAyd
Copy link
Member

WillAyd commented Apr 30, 2019

Hmm the message on your traceback doesn't match the current test. Were you getting this on master?

@WillAyd WillAyd added the Testing pandas testing functions or related to the test suite label Apr 30, 2019
@shoyer
Copy link
Member Author

shoyer commented Apr 30, 2019

Nope, we're still running pandas 0.24.1.

But it may be worth making this test deterministic regardless....

@WillAyd
Copy link
Member

WillAyd commented Apr 30, 2019

Yea not opposed to that. Do you see an easy way to get rid of the random call altogether though? On initial glance may be easier to read if we were to use literal values with a smaller frame if it's not too much extra effort

@shoyer
Copy link
Member Author

shoyer commented Apr 30, 2019 via email

@simonjayhawkins
Copy link
Member

fixed on master, xref #25225

@jreback jreback added this to the 0.25.0 milestone Apr 30, 2019
@jreback jreback merged commit 2266911 into pandas-dev:master Apr 30, 2019
@jreback
Copy link
Contributor

jreback commented Apr 30, 2019

thanks @shoyer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants