Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a property-based test using Hypothesis & Hypothesmith #1566

Merged
merged 1 commit into from
Aug 21, 2020

Conversation

Zac-HD
Copy link
Contributor

@Zac-HD Zac-HD commented Jul 30, 2020

At PyCon 2019, I gave a few talks about property-based testing, and also spoke to @ambv about generating test inputs for Black. There's a long history of this working really well for compilers (e.g. CSmith), and for formatting tools like prettier/prettier#371.

So starting at the sprints I wrote Hypothesmith to generate Python source code, reported a bunch of bugs, and eventually after some more upgrades thought that it might be useful to test black in its own test suite as well as my downstream code. So here we are!

This test will fail due to #970, #1012, #1557, and possibly other bugs. While Hypothesis will present (mostly) minimal failing inputs and can report multiple bugs based on exception type and location, you can expect to see mostly trivial or who-would-ever-write-that failures for a while. However, fixing those edge cases will allow Hypothesmith to explore elsewhere and find more subtle bugs.

@JelleZijlstra
Copy link
Collaborator

This is a great tool, thank you!

I'm not sure a normal unit test is a good place for this sort of test though. For one, it's currently failing, so we have to first fix the bugs you linked. As I understand it, it's also partly random, which is great for finding new edge cases, but means that people submitting PRs or running tests locally may see tests failing randomly because we happened to find a previously undetected bug.

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Jul 31, 2020

Glad you like it! There are definitely some questions about how you want to use it, but I thought that opening a PR would give us a more concrete starting point for discussion than an issue 😄

For one, it's currently failing, so we have to first fix the bugs you linked.

Yeah... the obvious options are (1) don't merge it yet, (2) @unittest.expectedFailure, or (3) give known bugs a distinct exception type which can be skipped within the test. Order of increasing work and also order of increasing chance to find other bugs, for what that's worth.

It's also partly random, which is great for finding new edge cases, but means that people submitting PRs or running tests locally may see tests failing randomly because we happened to find a previously undetected bug.

Using the derandomize=True setting would guarantee non-flakiness... it's even possible to use a settings profile to make it quick-and-deterministic or run many random examples depending on e.g. an environment variable. Two kinds of testing describes this pretty nicely.

'Sleeper' test failures are pretty rare though - upstream in Hypothesis we might have two or three per year over a few thousand tests. Once property-based tests are passing they don't tend to discover more failures, but you can easily run them in a loop overnight if that's a concern and better tooling for this is planned.

@ichard26
Copy link
Collaborator

ichard26 commented Aug 2, 2020

Thanks for the PR! I recently was introduced to Hypothesis and Hypothesmith while contributing to flake8-bugbear and I really like it! I do have a bit of concerns though, here's my two cents FWIW.

As a regular contributor, I second JelleZijlstra's slight apprehension about having Hypothesmith as a normal unit test. Since the flow we somewhat document on creating a new PR on the Black repository includes running all unit tests, quite a few PR contributors would hit the Hypothesmith test failing (from looking how many build jobs failed on this PR). Tests are usually designed to either make sure something works as expected or a bug wasn't reintroduced, i.e. making sure stuff still works. While Hypothesmith pushes the code further and further so it can find unseen bugs.

Regular unit tests and Hypothesmith tests are similar, but have different enough goals that they have different appropriate reactions to when they fail. This would add more complexity for a new contributor or a drive-by minor contributor since they might not be familiar enough with project and/or Hypothesmith to know/understand what to do next. Experienced contributor can deal with this new test once they quickly get up to speed with it, but means contributing to Black has a bit more friction. Although, good documentation on this special test in Black's CONTRIBUTING.md could go a long way on making this easier for newer contributor.

Another problem for new / newer contributor is that even if they submit a simple typo fix PR fully through the web interface, there is a non-small chance that the checks will fail due to luck. And if they aren't familiar with Hypothesmith or CI in general, they will probably end up confused. A red X isn't nice to see, and definitely not for newer contributors. Not the greatest experience for someone joining Black, the PSF org, or even open source (and on this front, Black can definitely do better).

TL;DR is that this addition complicates the path of contributing to Black, mostly for newer and less experienced contributors.

And no, I still want to see Hypothesis and Hypothesmith bring better testing to Black, but as you have pointed out, this is merely the start of a discussion :)

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Aug 2, 2020

Thanks for your comment @ichard26! I think you're suggesting that we

  • make this test deterministic with the derandomize=True setting
  • and/or move it out of the main test suite

which makes sense to me.

@cooperlees
Copy link
Collaborator

Thanks @Zac-HD! +1 on separate test here. Maybe only on Ubuntu too and I’ll be happy to merge!

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Aug 20, 2020

@cooperlees - how will you want to run this? A new tests/test_fuzz.py using unittest, a standalone script to run with python fuzztests.py, or what?

@cooperlees
Copy link
Collaborator

@cooperlees - how will you want to run this? A new tests/test_fuzz.py using unittest, a standalone script to run with python fuzztests.py, or what?

I feel a new fuzz_tests.yml GitHub Action workflow only on ubuntu unless you see value testing in different OS's was what I envisioned

  • cp .github/workflows/test.yml .github/workflows/fuzz.yml
  • edit
  • Win

Maybe even run it with coverage to see how much of the code the fuxx tests hit for some signal too?

  • coverage run fuzz.py
  • coverage report

https://github.com/psf/black/blob/master/.github/workflows/test.yml#L28

@Zac-HD Zac-HD force-pushed the property-tests branch 2 times, most recently from 15307de to 0dc7694 Compare August 21, 2020 11:05
@cooperlees
Copy link
Collaborator

This looks great, will merge if I can rebase after @ambv’s large magic trailing comma rewrite!

@ambv ambv merged commit cd3a93a into psf:master Aug 21, 2020
@ambv
Copy link
Collaborator

ambv commented Aug 21, 2020

Thanks! ✨ 🍰 ✨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants