TST: Remove subset of singleton fixtures #24873

jbrockmendel · 2019-01-22T03:00:08Z

Broken off of #24769 where we have learned that some test behavior depends on whether or not a fixture is being used.

I claim that this is another point in favor of not using fixtures if there is a regular-python alternative (in this case, a "function"). Whenever with-fixture behavior is different from without-fixture behavior, it is definitely the latter that better represents user runtime environments. That is what we should be testing.

jreback

see my comments on the other PR
this is a step backwards

jbrockmendel · 2019-01-22T03:11:34Z

this is a step backwards

@jreback you don't have to convince me of anything, clearly this won't be implemented if you're not on board.

But if you want to convince me that your way makes sense, you need to lay out a coherent argument for when we should and should not use fixtures.

For singleton fixtures like this, what upside do we get that we don't get from a simple function call?
Why use fixtures for some singletons but not for others?
- Should we get rid of the tm.makeFooBar functions?
- Many tests call pd.Timestamp.now(); should we make a fixture for that?
- If "no" for either of these, why are they different from the fixtures affected by this PR?

jreback · 2019-01-22T03:19:28Z

have a look in tests/frame and u will easily see zillions of places these should be used but are simply not yet changed over from the old syntax

jreback · 2019-01-22T03:21:39Z

yes we should really get rid of the makeFoo functions - they r a holdover from a time before fixtures - the only stopping block is they r semi public

Timestamp.now is too simple to be used as a fixture

jbrockmendel · 2019-01-22T03:41:31Z

have a look in tests/frame and u will easily see zillions of places these should be used but are simply not yet changed over from the old syntax

"should be used" is basically what I'm asking you to define in a non-tautological way.

If "the old syntax" is "a python function", then I see no upside to the change and would really like to understand the reasoning.

codecov · 2019-01-22T03:42:58Z

Codecov Report

Merging #24873 into master will decrease coverage by 49.5%.
The diff coverage is 21.62%.

@@             Coverage Diff             @@
##           master   #24873       +/-   ##
===========================================
- Coverage   92.38%   42.88%   -49.51%     
===========================================
  Files         166      166               
  Lines       52412    52449       +37     
===========================================
- Hits        48423    22491    -25932     
- Misses       3989    29958    +25969

Flag	Coverage Δ
#multiple	`?`
#single	`42.88% <21.62%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/util/testing.py	`49.91% <21.62%> (-38.13%)`	⬇️
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/core/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/converter.py	`0% <0%> (-100%)`	⬇️
pandas/io/formats/html.py	`0% <0%> (-99.35%)`	⬇️
pandas/core/groupby/categorical.py	`0% <0%> (-95.46%)`	⬇️
pandas/io/sas/sas7bdat.py	`0% <0%> (-91.17%)`	⬇️
pandas/io/sas/sas_xport.py	`0% <0%> (-90.15%)`	⬇️
... and 124 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12bb6d0...14d2941. Read the comment docs.

codecov · 2019-01-22T03:42:59Z

Codecov Report

Merging #24873 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #24873      +/-   ##
==========================================
+ Coverage   92.38%   92.39%   +<.01%     
==========================================
  Files         166      166              
  Lines       52412    52449      +37     
==========================================
+ Hits        48423    48461      +38     
+ Misses       3989     3988       -1

Flag	Coverage Δ
#multiple	`90.82% <100%> (ø)`	⬆️
#single	`42.87% <21.62%> (-0.03%)`	⬇️

Impacted Files	Coverage Δ
pandas/util/testing.py	`88.52% <100%> (+0.48%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12bb6d0...14d2941. Read the comment docs.

jreback · 2019-01-22T11:48:13Z

we prefer fixtures over functions almost always:

they are amenable to hierarchical grouping, locally one can override another definition if needed
you don’t have one giant file with imports of these; they can be grouped naturally in conftests which are automatically used
they provide nice composition with other fixtures and parameters
a test writer doesn’t have to worry about side effect / test setup or teardown
they less verbose than a full function call online the test
they live in the arguments as a fixture should

i am just really repeating the goodness of using fixtures over functions - they are strictly better for testing

singletons are not really a special case - they provide nice utility; again w/o having to have special import files scattered about - which inevitably leads to users actually calling these functions

so I am -1 on this PR and any like it; we have been moving in the opposite direction for many years

fixtures, organized well are a huge boon to testing

sure you create singleton fixtures when you FIRST make a function into a fixture - this just means that you need to move more boilerplate code into fixtures

simonjayhawkins · 2019-01-22T11:50:33Z

yes we should really get rid of the makeFoo functions - they r a holdover from a time before fixtures - the only stopping block is they r semi public

i think that only stopping block is they r semi public is maybe the key to why it maybe difficult to fully adopt the pytest idiom for pandas. There are just too many downstream packages.

in a pytestonic world i assume that all the functions in tm would be in conftest, even the assert functions and that pandas.util.testing would be no more.

personally i use the assert functions such as tm.assert_frame_equal to check a personal library and being able to import pandas.util.testing makes this easy.

Broken off of #24769 where we have learned that some test behavior depends on whether or not a fixture is being used.

I claim that this is another point in favor of not using fixtures...

i think that the opposite is true. the idea of fixtures is to make the testing more robust. the fixtures are functions that are run and a return value is the parameter used in the test. the default 'scope' (fixture scope is different to lexical scope etc) for a fixture is function scope which means that the fixture function is executed for each test producing a 'new' value. compared to the unittest class style using the setup method and making 'fixtures' class attributes the pytest approach is more robust. in theory we do not need to use copy() methods to 'protect' the fixture when executing tests.

jbrockmendel · 2019-01-22T16:34:19Z

in a pytestonic world

In cases where "pytestonic" and "pythonic" don't coincide, my default is to prefer pythonic. The point of the conversation (since the PR is going nowhere) is to understand why I should care about pytestonic.

you don’t have one giant file with imports of these

So when I read through a test I don't immediately know where a name is defined? I don't see this as a plus.

they provide nice composition with other fixtures and parameters

I am specifically focusing on fixtures without any parametrization or composition.

a test writer doesn’t have to worry about side effect / test setup or teardown

Same with a function call...

they less verbose than a full function call online the test

Not for the ones that get inlined. Inlining empty_frame = DataFrame({}) is way less verbose than a fixture.

For non-inlined fixtures, I'll give you this one, sort of. But there is a tradeoff with clarity. Consider two implementations of the same test.

def test_td64ser_div_numeric_scalar(self, tdser):
        expected = Series(['29.5D', '29.5D', 'NaT'], dtype='timedelta64[ns]')
        result = tdser / 2
        tm.assert_equal(result, expected)

def test_td64ser_div_numeric_scalar(self):
        tdser = pd.Series(['59 Days', '59 Days', 'NaT'], dtype='timedelta64[ns]')
        expected = Series(['29.5D', '29.5D', 'NaT'], dtype='timedelta64[ns]')
        result = tdser / 2
        tm.assert_equal(result, expected)

The second (my preferred) has one extra line, and so is technically more verbose. But the reader doesn't have to check what tdser means or dig up where it was defined.

they live in the arguments as a fixture should

@jreback this is a circular argument.

Besides which, flake8 won't tell us when we have unused test arguments (which does occur in these files).

i am just really repeating the goodness of using fixtures over functions - they are strictly better for testing

@jreback this. This is what I'm trying to understand.

which inevitably leads to users actually calling these functions

The pythonic way to handle this is an underscore.

i think that the opposite is true. the idea of fixtures is to make the testing more robust. [... juxtaposition with unittest class style]

@simonjayhawkins The most optimistic interpretation of the disagreement is that we are not talking about the same thing, so let's double-check. I am talking about the fact that in #24769 there is a fixture float_frame and an identical function get_float_frame such that

def test_foo(self, float_frame):
   [...]

def test_foo2(self):
    float_frame = get_float_frame()
    [...]

has one passing and the other failing. The behavior of the first test is reachable AFAICT only via the pytest command line, which doesn't match any user's runtime environment. If it doesn't match the behavior a user sees, then it is testing behavior I couldn't care less about.

If "the opposite is true", please tell me why I should care about behavior that only occurs via the pytest CLI. (and if it is not the case that it occurs only via the pytest CLI, please help me fix the extant test failure in #24769)

How many other tests do we have that are silently testing behavior that doesn't match what users will actually see?

@jorisvandenbossche am I missing something here? I'm trying not to rule out the possibility that jreback knows something I don't, but am still completely missing it.

TomAugspurger · 2019-01-22T16:41:57Z

I haven't read through everything, but I mostly agree with @jbrockmendel's sentiments. In cases where we aren't using the advanced features of a fixture (setup, teardown, parametrization), a function call should be equivalent. And I generally find function calls easier to reason about that fixtures (as they're easier to goto the definition if needed).

…

On Tue, Jan 22, 2019 at 10:34 AM jbrockmendel ***@***.***> wrote: in a pytestonic world In cases where "pytestonic" and "pythonic" don't coincide, my default is to prefer pythonic. The point of the conversation (since the PR is going nowhere) is to understand why I should care about pytestonic. you don’t have one giant file with imports of these So when I read through a test I don't immediately know where a name is defined? I don't see this as a plus. they provide nice composition with other fixtures and parameters I am specifically focusing on fixtures without any parametrization or composition. a test writer doesn’t have to worry about side effect / test setup or teardown Same with a function call... they less verbose than a full function call online the test Not for the ones that get inlined. Inlining empty_frame = DataFrame({}) is way less verbose than a fixture. For non-inlined fixtures, I'll give you this one, sort of. But there is a tradeoff with clarity. Consider two implementations of the same test. def test_td64ser_div_numeric_scalar(self, tdser): expected = Series(['29.5D', '29.5D', 'NaT'], dtype='timedelta64[ns]') result = tdser / 2 tm.assert_equal(result, expected) def test_td64ser_div_numeric_scalar(self): tdser = pd.Series(['59 Days', '59 Days', 'NaT'], dtype='timedelta64[ns]') expected = Series(['29.5D', '29.5D', 'NaT'], dtype='timedelta64[ns]') result = tdser / 2 tm.assert_equal(result, expected) The second (my preferred) has one extra line, and so is technically more verbose. But the reader doesn't have to check what tdser means or dig up where it was defined. they live in the arguments as a fixture should @jreback <https://github.com/jreback> this is a circular argument. Besides which, flake8 won't tell us when we have unused test arguments (which does occur in these files). i am just really repeating the goodness of using fixtures over functions - they are strictly better for testing @jreback <https://github.com/jreback> this. This is what I'm trying to understand. which inevitably leads to users actually calling these functions The pythonic way to handle this is an underscore. i think that the opposite is true. the idea of fixtures is to make the testing more robust. [... juxtaposition with unittest class style] @simonjayhawkins <https://github.com/simonjayhawkins> The most optimistic interpretation of the disagreement is that we are not talking about the same thing, so let's double-check. I am talking about the fact that in #24769 <#24769> there is a fixture float_frame and an identical function get_float_frame such that def test_foo(self, float_frame): [...] def test_foo2(self): float_frame = get_float_frame() [...] has one passing and the other failing. The behavior of the first test is reachable AFAICT *only* via the pytest command line, which doesn't match *any* user's runtime environment. If it doesn't match the behavior a user sees, then it is testing behavior I couldn't care less about. If "the opposite is true", please tell me why I should care about behavior that *only* occurs via the pytest CLI. (and if it is not the case that it occurs only via the pytest CLI, please help me fix the extant test failure in #24769 <#24769>) How many other tests do we have that are silently testing behavior that doesn't match what users will actually see? @jorisvandenbossche <https://github.com/jorisvandenbossche> am I missing something here? I'm trying not to rule out the possibility that jreback knows something I don't, but am still completely missing it. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#24873 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIlbNMWPmyh1QBf8durCX5lKtOna-ks5vFz2VgaJpZM4aL1xB> .

gfyoung · 2019-01-22T21:39:24Z

In cases where we aren't using the advanced features of a fixture (setup,
teardown, parametrization), a function call should be equivalent.

I agree with this sentiment as well. That being said, I have to agree that we probably haven't given these fixtures enough of a chance to be used elsewhere in the codebase.

In addition, I wonder if we really should be placing ALL of these functions in testing.py. Just like we don't put all of our fixtures in our top-level conftest.py, I don't see why we couldn't "scope" these objects to particular testing directories.

gfyoung · 2019-01-22T21:41:02Z

BTW, I would like to remind everyone that @jbrockmendel had this original issue (#23701) around this exact topic that he recently re-opened. I think we should continue the conversation there if we are currently at an impasse.

jreback · 2019-01-22T23:54:10Z

@jbrockmendel your example doesn't make much sense.

def test_foo(self, float_frame):
   [...]

def test_foo2(self):
    float_frame = get_float_frame()
    [...]

these should be identically, but we prefer the first is much more informative, scopes very nicely, plays nicely with fixtures, doesn't have the test writer worry about setup/teardown.

if you write tests and run them using pytest I am not sure why you are even making a claim that 'calling directly' functions are better? I am not getting your point here.

simonjayhawkins · 2019-01-23T02:15:00Z

I agree with this sentiment as well. That being said, I have to agree that we probably haven't given these fixtures enough of a chance to be used elsewhere in the codebase.

interestingly, i've only just realised that there is only one fixture in the codebase that is used with '@pytest.mark.parametrize(..., indirect=True)..... and i'm guilty!

the case where i've used it is not a good example to demonstrate the power of this capability. perhaps some of the singleton fixtures in question should be constructed to allow indirect parametrization and more benefit could then be gained from using fixtures.

WillAyd · 2019-01-23T11:39:50Z

Late to the discussion but FWIW also -1 on the change, mostly citing this comment

jorisvandenbossche · 2019-01-23T12:42:57Z

I am also following @jbrockmendel reasoning here. As Tom said, fixtures are really cool for the more advanced cases (ones that yield with a setup/teardown, parametrized ones, tempdir, ...).
But for simple, test data generation I often find that the way we use fixtures makes it much harder to read and understand the test (what is the data being used, where is it defined, ..?)

So I would be +1 on a more selective use of fixtures.

jorisvandenbossche · 2019-01-23T12:45:39Z

these should be identically,

@jbrockmendel do we already understand why they are not identical (there were occasional failures?)

[jreback] these should be identically, but we prefer the first is much more informative, scopes very nicely, plays nicely with fixtures, doesn't have the test writer worry about setup/teardown.

I think we are talking here about fixtures with simple data generation, so in that case there is no setup/teardown involved, it wouldn't need to integrate with other fixtures, .. (and personally I would argue it is also not more informative, but rather less, as the function-way shows where the function is coming from).

jbrockmendel · 2019-01-23T17:25:55Z

do we already understand why they are not identical (there were occasional failures?)

@jorisvandenbossche I don't have a handle on it. @simonjayhawkins is helping troubleshoot it over in #24769. Haven't given up on figuring it out, but in the interim I'm counting this as a point against unnecessary fixturization.

@WillAyd you reference this comment, can I ask what part of that you find compelling? I addressed it here.

That being said, I have to agree that we probably haven't given these fixtures enough of a chance to be used elsewhere in the codebase.

@gfyoung I'm not quite clear on what you mean by this. Any singleton fixture changed into a regular function can still be used elsewhere in the codebase.

@jreback re:

your example doesn't make much sense [...] these should be identically, but we prefer the first is much more informative, scopes very nicely, plays nicely with fixtures, doesn't have the test writer worry about setup/teardown.
if you write tests and run them using pytest I am not sure why you are even making a claim that 'calling directly' functions are better? I am not getting your point here.

"should behave identically" --> yah, they should. and the fact that they dont in at least one case suggests we should be testing the one that matches users runtime envirionment.

"first is much more informative" --> in what way? The only difference in the code is that it is less obvious where the thing is defined.

"scopes very nicely" --> honestly not sure what this means

"plays nicely with fixtures" --> again, this is a circular argument.

"doesn't have the test writer worry about setup/teardown" --> again, this is no different from a regular function

"run them using pytest" --> usually I run them using pytest. but when one fails I want to be able to copy/paste it line-by-line in the REPL. And again, we have at least one case in which the called-via-pytest behavior doesn't match the regular-python-function behavior. This is bad.

I am not sure why you are even making a claim that 'calling directly' functions are better? I am not getting your point here.

Zen of python ring a bell?

gfyoung · 2019-01-23T17:33:41Z

@jbrockmendel

Any singleton fixture changed into a regular function can still be used elsewhere in the codebase.

My point is that while I agree that truly singleton fixtures should be replaced, I'm not sure the current fixtures that we're removing are truly singletons, given how non-pytest-idiomatic some of the codebase still is.

"run them using pytest" --> usually I run them using pytest. but when one fails I want to be able to copy/paste it line-by-line in the REPL.

This sounds like you just don't like fixtures in any way, eh? 😉

BTW, I should point out that REPL isn't the only way to debug these tests (print statements, pdb)

jbrockmendel · 2019-01-23T17:37:45Z

I'm not sure the current fixtures that we're removing are truly singletons

Huh?

This sounds like you just don't like fixtures in any way, eh? 😉

I've come around to understand their usefulness in specific use cases. These are not those cases.

jorisvandenbossche · 2019-01-23T17:43:26Z

I'm not sure the current fixtures that we're removing are truly singletons,

I think we are again speaking about a different sort of "singleton" ? (see #24769 (comment))

h-vetinari · 2019-01-23T17:56:36Z

Regarding singletons-or-not, and the larger context of dependency injection in the frame/series tests (currently mostly through TestData-fixturesque-attributes), I'll quote my comment from #24769 that hasn't received a response there:

@jreback @jbrockmendel @simonjayhawkins
There reason most of the fixtures coming out of TestData are singletons at all is that there is an ongoing (cough) transition from the old TestData-attributes-as-fixtures towards proper fixtures, see #22236 (where @jreback asked me to open the following) #22471 #22550.
I haven't touched these issues that much anymore, since it was supposed to be a "good first issue" (and had some response along those lines for a while), and I was busy with other PRs.

That being said, I do find it hard to parse if a test requires specific properties of a fixture to work (e.g. specific column names or values), where one has to then go check in the respective conftest.py to figure out what's happening.

In that sense, I'd argue for fixtures that are minimal to the requirements of the test they're used in. If that means they're module-specific (and get moved there!), that's fine for me.

Fixtures are excellent for parametrisation though, and we should probably use a whole lot more of that, to increase breadth and the chance to catch edge cases early.

OTOH, if widespread fixturization is not seen as a positive, then #22471 #22550 should be re-discussed and maybe closed.

jbrockmendel · 2019-01-23T18:09:44Z

@h-vetinari I have no objection to moving away from the class attributes usage. My problems are specifically with

never-used fixtures (TST: remove never-used singleton fixtures #24885)
fixtures that can be inlined with strict performance+clarity improvements (TST: inline empty_frame = DataFrame({}) fixture #24886)
other fixtures that can be made into simple python function calls without pytest cognitive+perf overhead, e.g. this from tests.groupby.conftest:

@pytest.fixture
def ts():
    return tm.makeTimeSeries()

Fixtures which actually use any of the useful features of pytest.fixture, e.g. parametrization, are orthogonal to the issue/PR(s)

h-vetinari · 2019-01-23T18:21:42Z

@jbrockmendel
I'm broadly in agreement, but only wanted to point out is that the unused fixtures you mention have been put in place by #22236 to start the transition #22471 (for frames; similarly by #22526 / #22550 for Series).

I'll note that while those PRs/issues have me as the OP, the fixturisation was orthogonal to my goal with #22236, and is the result of review requirements.

jbrockmendel added 13 commits January 21, 2019 18:25

remove never-used fixtures

5e9c64b

implement fixture-like in tm, get rid of simple_frame fixture

6d97b6b

remove never-used fixtures

61b620a

remove datetime_frame fixture

8b3d84e

remove timezone_frame fixutre

02bfbfa

remove unused datetime_series fixture

619da2f

remove frame_of_index_cols fixture

d499150

get rid of float_frame_with_na fixture

26ebc21

get rid of mixed_int_frame fixture

32f6850

remove mixed_float_frame fixture

50076b7

delete missed arg

2f65598

remove straggler usages

9563564

remove int_frame fixture

0373616

jreback requested changes Jan 22, 2019

View reviewed changes

flake8 fixups

14d2941

gfyoung added Refactor Internal refactoring of code Testing pandas testing functions or related to the test suite Needs Discussion Requires discussion from core team before further action labels Jan 22, 2019

simonjayhawkins mentioned this pull request Jan 22, 2019

DISC: When should and shouldn't we use pytest fixtures? #23701

Closed

jbrockmendel mentioned this pull request Jan 23, 2019

TST: remove never-used singleton fixtures #24885

Merged

jbrockmendel mentioned this pull request Jan 23, 2019

TST: inline empty_frame = DataFrame({}) fixture #24886

Merged

simonjayhawkins mentioned this pull request Jan 23, 2019

TST: fixturize test_to_html alignment checks #24887

Closed

jbrockmendel closed this Feb 3, 2019

jbrockmendel deleted the singletons3 branch April 5, 2020 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: Remove subset of singleton fixtures #24873

TST: Remove subset of singleton fixtures #24873

jbrockmendel commented Jan 22, 2019

jreback left a comment

jbrockmendel commented Jan 22, 2019

jreback commented Jan 22, 2019

jreback commented Jan 22, 2019

jbrockmendel commented Jan 22, 2019

codecov bot commented Jan 22, 2019

codecov bot commented Jan 22, 2019 •

edited

Loading

jreback commented Jan 22, 2019

simonjayhawkins commented Jan 22, 2019

jbrockmendel commented Jan 22, 2019

TomAugspurger commented Jan 22, 2019 via email

gfyoung commented Jan 22, 2019

gfyoung commented Jan 22, 2019

jreback commented Jan 22, 2019

simonjayhawkins commented Jan 23, 2019

WillAyd commented Jan 23, 2019

jorisvandenbossche commented Jan 23, 2019

jorisvandenbossche commented Jan 23, 2019

jbrockmendel commented Jan 23, 2019

gfyoung commented Jan 23, 2019 •

edited

Loading

jbrockmendel commented Jan 23, 2019

jorisvandenbossche commented Jan 23, 2019

h-vetinari commented Jan 23, 2019

jbrockmendel commented Jan 23, 2019

h-vetinari commented Jan 23, 2019

TST: Remove subset of singleton fixtures #24873

TST: Remove subset of singleton fixtures #24873

Conversation

jbrockmendel commented Jan 22, 2019

jreback left a comment

Choose a reason for hiding this comment

jbrockmendel commented Jan 22, 2019

jreback commented Jan 22, 2019

jreback commented Jan 22, 2019

jbrockmendel commented Jan 22, 2019

codecov bot commented Jan 22, 2019

Codecov Report

codecov bot commented Jan 22, 2019 • edited Loading

Codecov Report

jreback commented Jan 22, 2019

simonjayhawkins commented Jan 22, 2019

jbrockmendel commented Jan 22, 2019

TomAugspurger commented Jan 22, 2019 via email

gfyoung commented Jan 22, 2019

gfyoung commented Jan 22, 2019

jreback commented Jan 22, 2019

simonjayhawkins commented Jan 23, 2019

WillAyd commented Jan 23, 2019

jorisvandenbossche commented Jan 23, 2019

jorisvandenbossche commented Jan 23, 2019

jbrockmendel commented Jan 23, 2019

gfyoung commented Jan 23, 2019 • edited Loading

jbrockmendel commented Jan 23, 2019

jorisvandenbossche commented Jan 23, 2019

h-vetinari commented Jan 23, 2019

jbrockmendel commented Jan 23, 2019

h-vetinari commented Jan 23, 2019

codecov bot commented Jan 22, 2019 •

edited

Loading

gfyoung commented Jan 23, 2019 •

edited

Loading