Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Remove subset of singleton fixtures #24873

Closed
wants to merge 14 commits into from

Conversation

jbrockmendel
Copy link
Member

Broken off of #24769 where we have learned that some test behavior depends on whether or not a fixture is being used.

I claim that this is another point in favor of not using fixtures if there is a regular-python alternative (in this case, a "function"). Whenever with-fixture behavior is different from without-fixture behavior, it is definitely the latter that better represents user runtime environments. That is what we should be testing.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments on the other PR
this is a step backwards

@jbrockmendel
Copy link
Member Author

this is a step backwards

@jreback you don't have to convince me of anything, clearly this won't be implemented if you're not on board.

But if you want to convince me that your way makes sense, you need to lay out a coherent argument for when we should and should not use fixtures.

  • For singleton fixtures like this, what upside do we get that we don't get from a simple function call?
  • Why use fixtures for some singletons but not for others?
    • Should we get rid of the tm.makeFooBar functions?
    • Many tests call pd.Timestamp.now(); should we make a fixture for that?
    • If "no" for either of these, why are they different from the fixtures affected by this PR?

@jreback
Copy link
Contributor

jreback commented Jan 22, 2019

have a look in tests/frame and u will easily see zillions of places these should be used but are simply not yet changed over from the old syntax

@jreback
Copy link
Contributor

jreback commented Jan 22, 2019

yes we should really get rid of the makeFoo functions - they r a holdover from a time before fixtures - the only stopping block is they r semi public

Timestamp.now is too simple to be used as a fixture

@jbrockmendel
Copy link
Member Author

have a look in tests/frame and u will easily see zillions of places these should be used but are simply not yet changed over from the old syntax

"should be used" is basically what I'm asking you to define in a non-tautological way.

If "the old syntax" is "a python function", then I see no upside to the change and would really like to understand the reasoning.

@codecov
Copy link

codecov bot commented Jan 22, 2019

Codecov Report

Merging #24873 into master will decrease coverage by 49.5%.
The diff coverage is 21.62%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #24873       +/-   ##
===========================================
- Coverage   92.38%   42.88%   -49.51%     
===========================================
  Files         166      166               
  Lines       52412    52449       +37     
===========================================
- Hits        48423    22491    -25932     
- Misses       3989    29958    +25969
Flag Coverage Δ
#multiple ?
#single 42.88% <21.62%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/util/testing.py 49.91% <21.62%> (-38.13%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.35%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.17%) ⬇️
pandas/io/sas/sas_xport.py 0% <0%> (-90.15%) ⬇️
... and 124 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12bb6d0...14d2941. Read the comment docs.

@codecov
Copy link

codecov bot commented Jan 22, 2019

Codecov Report

Merging #24873 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24873      +/-   ##
==========================================
+ Coverage   92.38%   92.39%   +<.01%     
==========================================
  Files         166      166              
  Lines       52412    52449      +37     
==========================================
+ Hits        48423    48461      +38     
+ Misses       3989     3988       -1
Flag Coverage Δ
#multiple 90.82% <100%> (ø) ⬆️
#single 42.87% <21.62%> (-0.03%) ⬇️
Impacted Files Coverage Δ
pandas/util/testing.py 88.52% <100%> (+0.48%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12bb6d0...14d2941. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Jan 22, 2019

we prefer fixtures over functions almost always:

  • they are amenable to hierarchical grouping, locally one can override another definition if needed
  • you don’t have one giant file with imports of these; they can be grouped naturally in conftests which are automatically used
  • they provide nice composition with other fixtures and parameters
  • a test writer doesn’t have to worry about side effect / test setup or teardown
  • they less verbose than a full function call online the test
  • they live in the arguments as a fixture should

i am just really repeating the goodness of using fixtures over functions - they are strictly better for testing

singletons are not really a special case - they provide nice utility; again w/o having to have special import files scattered about - which inevitably leads to users actually calling these functions

so I am -1 on this PR and any like it; we have been moving in the opposite direction for many years

fixtures, organized well are a huge boon to testing

sure you create singleton fixtures when you FIRST make a function into a fixture - this just means that you need to move more boilerplate code into fixtures

@simonjayhawkins
Copy link
Member

yes we should really get rid of the makeFoo functions - they r a holdover from a time before fixtures - the only stopping block is they r semi public

i think that only stopping block is they r semi public is maybe the key to why it maybe difficult to fully adopt the pytest idiom for pandas. There are just too many downstream packages.

in a pytestonic world i assume that all the functions in tm would be in conftest, even the assert functions and that pandas.util.testing would be no more.

personally i use the assert functions such as tm.assert_frame_equal to check a personal library and being able to import pandas.util.testing makes this easy.

Broken off of #24769 where we have learned that some test behavior depends on whether or not a fixture is being used.

I claim that this is another point in favor of not using fixtures...

i think that the opposite is true. the idea of fixtures is to make the testing more robust. the fixtures are functions that are run and a return value is the parameter used in the test. the default 'scope' (fixture scope is different to lexical scope etc) for a fixture is function scope which means that the fixture function is executed for each test producing a 'new' value. compared to the unittest class style using the setup method and making 'fixtures' class attributes the pytest approach is more robust. in theory we do not need to use copy() methods to 'protect' the fixture when executing tests.

@jbrockmendel
Copy link
Member Author

in a pytestonic world

In cases where "pytestonic" and "pythonic" don't coincide, my default is to prefer pythonic. The point of the conversation (since the PR is going nowhere) is to understand why I should care about pytestonic.

you don’t have one giant file with imports of these

So when I read through a test I don't immediately know where a name is defined? I don't see this as a plus.

they provide nice composition with other fixtures and parameters

I am specifically focusing on fixtures without any parametrization or composition.

a test writer doesn’t have to worry about side effect / test setup or teardown

Same with a function call...

they less verbose than a full function call online the test

Not for the ones that get inlined. Inlining empty_frame = DataFrame({}) is way less verbose than a fixture.

For non-inlined fixtures, I'll give you this one, sort of. But there is a tradeoff with clarity. Consider two implementations of the same test.

def test_td64ser_div_numeric_scalar(self, tdser):
        expected = Series(['29.5D', '29.5D', 'NaT'], dtype='timedelta64[ns]')
        result = tdser / 2
        tm.assert_equal(result, expected)

def test_td64ser_div_numeric_scalar(self):
        tdser = pd.Series(['59 Days', '59 Days', 'NaT'], dtype='timedelta64[ns]')
        expected = Series(['29.5D', '29.5D', 'NaT'], dtype='timedelta64[ns]')
        result = tdser / 2
        tm.assert_equal(result, expected)

The second (my preferred) has one extra line, and so is technically more verbose. But the reader doesn't have to check what tdser means or dig up where it was defined.

they live in the arguments as a fixture should

@jreback this is a circular argument.

Besides which, flake8 won't tell us when we have unused test arguments (which does occur in these files).

i am just really repeating the goodness of using fixtures over functions - they are strictly better for testing

@jreback this. This is what I'm trying to understand.

which inevitably leads to users actually calling these functions

The pythonic way to handle this is an underscore.

i think that the opposite is true. the idea of fixtures is to make the testing more robust. [... juxtaposition with unittest class style]

@simonjayhawkins The most optimistic interpretation of the disagreement is that we are not talking about the same thing, so let's double-check. I am talking about the fact that in #24769 there is a fixture float_frame and an identical function get_float_frame such that

def test_foo(self, float_frame):
   [...]

def test_foo2(self):
    float_frame = get_float_frame()
    [...]

has one passing and the other failing. The behavior of the first test is reachable AFAICT only via the pytest command line, which doesn't match any user's runtime environment. If it doesn't match the behavior a user sees, then it is testing behavior I couldn't care less about.

If "the opposite is true", please tell me why I should care about behavior that only occurs via the pytest CLI. (and if it is not the case that it occurs only via the pytest CLI, please help me fix the extant test failure in #24769)

How many other tests do we have that are silently testing behavior that doesn't match what users will actually see?

@jorisvandenbossche am I missing something here? I'm trying not to rule out the possibility that jreback knows something I don't, but am still completely missing it.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jan 22, 2019 via email

@gfyoung gfyoung added Refactor Internal refactoring of code Testing pandas testing functions or related to the test suite Needs Discussion Requires discussion from core team before further action labels Jan 22, 2019
@gfyoung
Copy link
Member

gfyoung commented Jan 22, 2019

In cases where we aren't using the advanced features of a fixture (setup,
teardown, parametrization), a function call should be equivalent.

I agree with this sentiment as well. That being said, I have to agree that we probably haven't given these fixtures enough of a chance to be used elsewhere in the codebase.

In addition, I wonder if we really should be placing ALL of these functions in testing.py. Just like we don't put all of our fixtures in our top-level conftest.py, I don't see why we couldn't "scope" these objects to particular testing directories.

@gfyoung
Copy link
Member

gfyoung commented Jan 22, 2019

BTW, I would like to remind everyone that @jbrockmendel had this original issue (#23701) around this exact topic that he recently re-opened. I think we should continue the conversation there if we are currently at an impasse.

@jreback
Copy link
Contributor

jreback commented Jan 22, 2019

@jbrockmendel your example doesn't make much sense.

def test_foo(self, float_frame):
   [...]

def test_foo2(self):
    float_frame = get_float_frame()
    [...]

these should be identically, but we prefer the first is much more informative, scopes very nicely, plays nicely with fixtures, doesn't have the test writer worry about setup/teardown.

if you write tests and run them using pytest I am not sure why you are even making a claim that 'calling directly' functions are better? I am not getting your point here.

@simonjayhawkins
Copy link
Member

I agree with this sentiment as well. That being said, I have to agree that we probably haven't given these fixtures enough of a chance to be used elsewhere in the codebase.

interestingly, i've only just realised that there is only one fixture in the codebase that is used with '@pytest.mark.parametrize(..., indirect=True)..... and i'm guilty!

the case where i've used it is not a good example to demonstrate the power of this capability. perhaps some of the singleton fixtures in question should be constructed to allow indirect parametrization and more benefit could then be gained from using fixtures.

@WillAyd
Copy link
Member

WillAyd commented Jan 23, 2019

Late to the discussion but FWIW also -1 on the change, mostly citing this comment

@jorisvandenbossche
Copy link
Member

I am also following @jbrockmendel reasoning here. As Tom said, fixtures are really cool for the more advanced cases (ones that yield with a setup/teardown, parametrized ones, tempdir, ...).
But for simple, test data generation I often find that the way we use fixtures makes it much harder to read and understand the test (what is the data being used, where is it defined, ..?)

So I would be +1 on a more selective use of fixtures.

@jorisvandenbossche
Copy link
Member

these should be identically,

@jbrockmendel do we already understand why they are not identical (there were occasional failures?)

[jreback] these should be identically, but we prefer the first is much more informative, scopes very nicely, plays nicely with fixtures, doesn't have the test writer worry about setup/teardown.

I think we are talking here about fixtures with simple data generation, so in that case there is no setup/teardown involved, it wouldn't need to integrate with other fixtures, .. (and personally I would argue it is also not more informative, but rather less, as the function-way shows where the function is coming from).

@jbrockmendel
Copy link
Member Author

do we already understand why they are not identical (there were occasional failures?)

@jorisvandenbossche I don't have a handle on it. @simonjayhawkins is helping troubleshoot it over in #24769. Haven't given up on figuring it out, but in the interim I'm counting this as a point against unnecessary fixturization.

@WillAyd you reference this comment, can I ask what part of that you find compelling? I addressed it here.

That being said, I have to agree that we probably haven't given these fixtures enough of a chance to be used elsewhere in the codebase.

@gfyoung I'm not quite clear on what you mean by this. Any singleton fixture changed into a regular function can still be used elsewhere in the codebase.

@jreback re:

your example doesn't make much sense [...] these should be identically, but we prefer the first is much more informative, scopes very nicely, plays nicely with fixtures, doesn't have the test writer worry about setup/teardown.
if you write tests and run them using pytest I am not sure why you are even making a claim that 'calling directly' functions are better? I am not getting your point here.

"should behave identically" --> yah, they should. and the fact that they dont in at least one case suggests we should be testing the one that matches users runtime envirionment.

"first is much more informative" --> in what way? The only difference in the code is that it is less obvious where the thing is defined.

"scopes very nicely" --> honestly not sure what this means

"plays nicely with fixtures" --> again, this is a circular argument.

"doesn't have the test writer worry about setup/teardown" --> again, this is no different from a regular function

"run them using pytest" --> usually I run them using pytest. but when one fails I want to be able to copy/paste it line-by-line in the REPL. And again, we have at least one case in which the called-via-pytest behavior doesn't match the regular-python-function behavior. This is bad.

I am not sure why you are even making a claim that 'calling directly' functions are better? I am not getting your point here.

Zen of python ring a bell?

@gfyoung
Copy link
Member

gfyoung commented Jan 23, 2019

@jbrockmendel

Any singleton fixture changed into a regular function can still be used elsewhere in the codebase.

My point is that while I agree that truly singleton fixtures should be replaced, I'm not sure the current fixtures that we're removing are truly singletons, given how non-pytest-idiomatic some of the codebase still is.

"run them using pytest" --> usually I run them using pytest. but when one fails I want to be able to copy/paste it line-by-line in the REPL.

This sounds like you just don't like fixtures in any way, eh? 😉

BTW, I should point out that REPL isn't the only way to debug these tests (print statements, pdb)

@jbrockmendel
Copy link
Member Author

I'm not sure the current fixtures that we're removing are truly singletons

Huh?

This sounds like you just don't like fixtures in any way, eh? 😉

I've come around to understand their usefulness in specific use cases. These are not those cases.

@jorisvandenbossche
Copy link
Member

I'm not sure the current fixtures that we're removing are truly singletons,

I think we are again speaking about a different sort of "singleton" ? (see #24769 (comment))

@h-vetinari
Copy link
Contributor

Regarding singletons-or-not, and the larger context of dependency injection in the frame/series tests (currently mostly through TestData-fixturesque-attributes), I'll quote my comment from #24769 that hasn't received a response there:

@jreback @jbrockmendel @simonjayhawkins
There reason most of the fixtures coming out of TestData are singletons at all is that there is an ongoing (cough) transition from the old TestData-attributes-as-fixtures towards proper fixtures, see #22236 (where @jreback asked me to open the following) #22471 #22550.
I haven't touched these issues that much anymore, since it was supposed to be a "good first issue" (and had some response along those lines for a while), and I was busy with other PRs.

That being said, I do find it hard to parse if a test requires specific properties of a fixture to work (e.g. specific column names or values), where one has to then go check in the respective conftest.py to figure out what's happening.

In that sense, I'd argue for fixtures that are minimal to the requirements of the test they're used in. If that means they're module-specific (and get moved there!), that's fine for me.

Fixtures are excellent for parametrisation though, and we should probably use a whole lot more of that, to increase breadth and the chance to catch edge cases early.

OTOH, if widespread fixturization is not seen as a positive, then #22471 #22550 should be re-discussed and maybe closed.

@jbrockmendel
Copy link
Member Author

@h-vetinari I have no objection to moving away from the class attributes usage. My problems are specifically with

@pytest.fixture
def ts():
    return tm.makeTimeSeries()

Fixtures which actually use any of the useful features of pytest.fixture, e.g. parametrization, are orthogonal to the issue/PR(s)

@h-vetinari
Copy link
Contributor

@jbrockmendel
I'm broadly in agreement, but only wanted to point out is that the unused fixtures you mention have been put in place by #22236 to start the transition #22471 (for frames; similarly by #22526 / #22550 for Series).

I'll note that while those PRs/issues have me as the OP, the fixturisation was orthogonal to my goal with #22236, and is the result of review requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Discussion Requires discussion from core team before further action Refactor Internal refactoring of code Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants