TST: make test_astype_categorical_to_other deterministic #26244

shoyer · 2019-04-30T00:01:21Z

The use of an unseeded random number generator means that this test can
occasionally fail for no particular reason.

In particular, I saw the following error:

=================================== FAILURES ===================================
______________ TestSeriesDtypes.test_astype_categorical_to_other _______________

self = <pandas.tests.series.test_dtypes.TestSeriesDtypes object at 0x7f223951b750>

    def test_astype_categorical_to_other(self):

        df = DataFrame({'value': np.random.randint(0, 10000, 100)})
        labels = ["{0} - {1}".format(i, i + 499) for i in range(0, 10000, 500)]
        cat_labels = Categorical(labels, labels)

        df = df.sort_values(by=['value'], ascending=True)
        df['value_group'] = pd.cut(df.value, range(0, 10500, 500),
                                   right=False, labels=cat_labels)

        s = df['value_group']
        expected = s
        tm.assert_series_equal(s.astype('category'), expected)
        tm.assert_series_equal(s.astype(CategoricalDtype()), expected)
        msg = (r"could not convert string to float: '(0 - 499|9500 - 9999)'|"
               r"invalid literal for float\(\): (0 - 499|9500 - 9999)")
        with pytest.raises(ValueError, match=msg):
>           s.astype('float64')
E           AssertionError: Pattern 'could not convert string to float: '(0 - 499|9500 - 9999)'|invalid literal for float\(\): (0 - 499|9500 - 9999)' not found in 'invalid literal for float(): 9000 - 9499'

By setting the random number seed, this should no longer be able to happen.

tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff

The use of an unseeded random number generator means that this test can occasionally fail for no particular reason. In particular, I saw the following error: =================================== FAILURES =================================== ______________ TestSeriesDtypes.test_astype_categorical_to_other _______________ self = <pandas.tests.series.test_dtypes.TestSeriesDtypes object at 0x7f223951b750> def test_astype_categorical_to_other(self): df = DataFrame({'value': np.random.randint(0, 10000, 100)}) labels = ["{0} - {1}".format(i, i + 499) for i in range(0, 10000, 500)] cat_labels = Categorical(labels, labels) df = df.sort_values(by=['value'], ascending=True) df['value_group'] = pd.cut(df.value, range(0, 10500, 500), right=False, labels=cat_labels) s = df['value_group'] expected = s tm.assert_series_equal(s.astype('category'), expected) tm.assert_series_equal(s.astype(CategoricalDtype()), expected) msg = (r"could not convert string to float: '(0 - 499|9500 - 9999)'|" r"invalid literal for float\(\): (0 - 499|9500 - 9999)") with pytest.raises(ValueError, match=msg): > s.astype('float64') E AssertionError: Pattern 'could not convert string to float: '(0 - 499|9500 - 9999)'|invalid literal for float\(\): (0 - 499|9500 - 9999)' not found in 'invalid literal for float(): 9000 - 9499' By setting the random number seed, this should no longer be able to happen.

codecov · 2019-04-30T00:43:02Z

Codecov Report

Merging #26244 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #26244      +/-   ##
==========================================
- Coverage   91.97%   91.96%   -0.01%     
==========================================
  Files         175      175              
  Lines       52368    52368              
==========================================
- Hits        48164    48160       -4     
- Misses       4204     4208       +4

Flag	Coverage Δ
#multiple	`90.52% <ø> (ø)`	⬆️
#single	`40.69% <ø> (-0.15%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`96.9% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9feb3ad...a93b15e. Read the comment docs.

codecov · 2019-04-30T00:43:02Z

Codecov Report

Merging #26244 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #26244      +/-   ##
==========================================
- Coverage   91.97%   91.96%   -0.01%     
==========================================
  Files         175      175              
  Lines       52368    52368              
==========================================
- Hits        48164    48160       -4     
- Misses       4204     4208       +4

Flag	Coverage Δ
#multiple	`90.52% <ø> (ø)`	⬆️
#single	`40.69% <ø> (-0.15%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`96.9% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9feb3ad...a93b15e. Read the comment docs.

WillAyd · 2019-04-30T01:00:26Z

Hmm the message on your traceback doesn't match the current test. Were you getting this on master?

shoyer · 2019-04-30T01:07:45Z

Nope, we're still running pandas 0.24.1.

But it may be worth making this test deterministic regardless....

WillAyd · 2019-04-30T01:14:07Z

Yea not opposed to that. Do you see an easy way to get rid of the random call altogether though? On initial glance may be easier to read if we were to use literal values with a smaller frame if it's not too much extra effort

shoyer · 2019-04-30T03:09:51Z

I'm sure literal values with a smaller frame would be even better, but I don't have the time to dive into the details of this test.

…

On Mon, Apr 29, 2019 at 6:14 PM William Ayd ***@***.***> wrote: Yea not opposed to that. Do you see an easy way to get rid of the random call altogether though? On initial glance may be easier to read if we were to use literal values with a smaller frame if it's not too much extra effort — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#26244 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAJJFVUDPNMQ6YTLDHB27CTPS6MOLANCNFSM4HJHTCEQ> .

simonjayhawkins · 2019-04-30T08:40:38Z

fixed on master, xref #25225

jreback · 2019-04-30T10:25:44Z

thanks @shoyer

WillAyd added the Testing pandas testing functions or related to the test suite label Apr 30, 2019

jreback added this to the 0.25.0 milestone Apr 30, 2019

jreback merged commit 2266911 into pandas-dev:master Apr 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: make test_astype_categorical_to_other deterministic #26244

TST: make test_astype_categorical_to_other deterministic #26244

shoyer commented Apr 30, 2019

codecov bot commented Apr 30, 2019

codecov bot commented Apr 30, 2019 •

edited

Loading

WillAyd commented Apr 30, 2019

shoyer commented Apr 30, 2019

WillAyd commented Apr 30, 2019

shoyer commented Apr 30, 2019 via email

simonjayhawkins commented Apr 30, 2019

jreback commented Apr 30, 2019

TST: make test_astype_categorical_to_other deterministic #26244

TST: make test_astype_categorical_to_other deterministic #26244

Conversation

shoyer commented Apr 30, 2019

codecov bot commented Apr 30, 2019

Codecov Report

codecov bot commented Apr 30, 2019 • edited Loading

Codecov Report

WillAyd commented Apr 30, 2019

shoyer commented Apr 30, 2019

WillAyd commented Apr 30, 2019

shoyer commented Apr 30, 2019 via email

simonjayhawkins commented Apr 30, 2019

jreback commented Apr 30, 2019

codecov bot commented Apr 30, 2019 •

edited

Loading