Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test suite improvements #6364

Closed
5 of 8 tasks
stinodego opened this issue Jan 22, 2023 · 5 comments
Closed
5 of 8 tasks

Test suite improvements #6364

stinodego opened this issue Jan 22, 2023 · 5 comments
Assignees
Labels
test Related to the test suite

Comments

@stinodego
Copy link
Member

stinodego commented Jan 22, 2023

I am planning to do some work on improving the test suite. Tracking my plans and progress in this issue, so that others may comment.

  • Use pytest-xdist to parallelize running our tests
    • We are running a lot of tests, this will help keep things manageable.
    • This crashes on some IO tests currently. I plan to move those tests out of the 'regular' flow, so that we can run make test in parallel
  • Use pytest custom markers where appropriate
    • This will make it easier to run specific parts of the test suite, for example "everything except parametric tests"
    • Unfortunately, I haven't found a way to simply mark all tests in a directory, so we'll have to see if this really adds utility
  • Rethink the setup of IO tests
    • I think we can improve here by utilizing TemporaryDirectory and otherwise making sure tests can run independently.
  • Get rid of verify_series_and_expr_api. All Series methods that have a matching Expr method already dispatch to it - no need to run these tests twice.
  • Profile tests using pytest --durations X and mark the slowest tests as 'slow' to remove them from the regular run.
  • Rethink the folder/file structure
    • I think we can do a bit better here; the current setup is not too intuitive.
  • Better utilize pytest tools like parametrize and fixture.
  • Better utilize our own testing tools
    • We have things like assert_frame_equal and assert_series_equal, but don't use it consistently.
    • We often check only shapes or rows because specifying a full frame can be cumbersome. If we formalize this somehow, it will make writing our tests a lot easier.
  • Get rid of the doctest: +IGNORE_RESULT directive
    • My main motivation for doing this was to be able to run doctests within the pytest suite. While this is possible, configuration options are limited/cumbersome. For example, we'd need a conftest.py in the py-polars directory to inject from polars import pl for each test. I don't like tests leaking out into the main folder structure. I think we are better off with our current approach.

Input is welcome!

@zundertj
Copy link
Collaborator

On the doctests: it would be nice to have it as part of the same test suite, but I don't think there is much added value in doing so? In particular, I like the split as it is because the temptation will otherwise be to use the doctests as the primary tool to drive test coverage and skip on the unit test altogether. Instead, the doc examples should be geared towards documenting the function and giving examples of where a function may be useful. That partly overlaps with the goals of a unit test, but not completely?

@ghuls
Copy link
Collaborator

ghuls commented Jan 22, 2023

We often check only shapes or rows because specifying a full frame can be cumbersome. If we formalize this somehow, it will make writing our tests a lot easier.

This has hidden quite some bugs in the past as in a lot of cases dtypes were not checked.
I think it would also be a good idea to have one or more standard test dataframes with all currently supported dtypes and have tests for all of them for functionality where it makes sense. Most tests just use datatypes with very simple dtypes.

And for properly testing the streaming operations, much bigger dataframes are required to trigger all codepaths, I assume.

@stinodego
Copy link
Member Author

On the doctests: it would be nice to have it as part of the same test suite, but I don't think there is much added value in doing so? In particular, I like the split as it is because the temptation will otherwise be to use the doctests as the primary tool to drive test coverage and skip on the unit test altogether. Instead, the doc examples should be geared towards documenting the function and giving examples of where a function may be useful. That partly overlaps with the goals of a unit test, but not completely?

I agree that test coverage should come from 'actual' tests in the tests folder, not from doc examples. That's exactly why I would like to get rid of the +IGNORE_RESULT directive; the method should already be tested in the unittests, so if the doc example gives inconsistent results, might as well just skip it.

I think it'll just be a bit cleaner to have pytest handle all the testing; configuration can then be in the pyproject.toml instead of in a random Python file in the tests folder. We can just exclude doctests from the make coverage command, and we're good to go.

@zundertj
Copy link
Collaborator

IGNORE_RESULT allows us to check that the code does not error, which was, at least at the time, happening for a number of examples. Skipping altogether means that there is no validation that the examples even run. Probably we should try to minimize the number of examples where this is needed in the first place to minimize the impact.

@stinodego
Copy link
Member Author

There is still work to be done, but I'll close this for now as the most pressing concerns have been addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Related to the test suite
Projects
None yet
Development

No branches or pull requests

3 participants