Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support deterministic testing using Parameters. #263

Merged
merged 5 commits into from
Nov 21, 2016

Conversation

non
Copy link
Contributor

@non non commented Sep 10, 2016

This commit adds an initialSeed parameter to the Parameter types
that Test and Gen use to control test execution. When set to None,
things behave as they usually do, but when set to Some(seed), that
seed will be used to start the execution.

This makes it easier to configure repeatable tests. In the future, you
could imagine ScalaCheck emitting which seed it started with during a
failing test, and providing an interface for re-running the failing
test with the same seed.

Also included are some tests to ensure generators are deterministic.
We can add more of these for different types of generators in the future
as needed.

Review by @rickynils and @sirthias.

@non
Copy link
Contributor Author

non commented Sep 10, 2016

The one thing that's a bit ugly is that our rng.Seed has 256-bits of state, but we don't expose a convenient constructor to initialize with four Long values. Since we also don't necessarily show people the seed being used, this is maybe OK. In the long run it might be nice to give Seed a nice way to display itself (maybe Base-64 encoded?) and then have a constructor that works on those values.

@non
Copy link
Contributor Author

non commented Sep 11, 2016

I'll take a look and see if I can work around these MiMA errors, sorry!

@non
Copy link
Contributor Author

non commented Sep 13, 2016

So I'm not 100% sure but it seems like changing this API might break binary compatibility. @rickynils is this worth putting on a roadmap for 1.14.0? What do you think?

I'll try to minimize the disruption (and most of the errors occur on a private inner class) but it's not totally clear we can get this in 1.13.x.

@sirthias
Copy link
Contributor

With these API changes, If I don't supply an explicit initialSeed and scalacheck thus uses a random one, is there a way for me to retrieve which seed was used?

@non
Copy link
Contributor Author

non commented Sep 19, 2016

@sirthias I'll try to see if that can be done.

I wanted to avoid a larger change of threading RNG state through the higher-level structures, but that may be necessary to be able to display the seed correctly.

@sirthias
Copy link
Contributor

Thanks again, Erik!
The feature of having access to the randomly set seed is no showstopper though, since I can also simply always set my own seed, random or not, to have complete control over everything,

@non
Copy link
Contributor Author

non commented Sep 21, 2016

@sirthias Good to know!

I'm still looking into a larger change that threads the seed through the higher level structure, which makes it easier to display the corresponding seed for a failing test/property, but don't have anything working yet.

@rickynils Is this something you'd want to see in 1.14? Would you prefer a different approach? Is this something you'd rather not include?

@rickynils
Copy link
Contributor

@non Sorry for the late feedback. I think this looks great, and definitely should be in 1.14. The approach of having the initial seed as part of Parameters was exactly what I had in mind when I opened #67.

@sirthias
Copy link
Contributor

Gentlemen,
just for our information: Do you have any indication of when this might make into a release?
ScalaCheck 1.13 isn't usable really usable for us since there is no way to achieve repeatable tests, which was still possible with 1.12.

It's great that you cross-published 1.12 for Scala 2.12, but unfortunately the only version of Scalatest that's available for 2.12 depends on ScalaCheck 1.13 (and we use ScalaCheck as a sub-dependency of Scalatest).

I'm gonna ask @bvenners if it'd be possible to cross-publish Scalatest 2.2.6 for Scala 2.12 but the only real way forward would be to get this PR merged and ScalaCheck 0.14 underway.

Thanks again for all the help with this issue!

@non
Copy link
Contributor Author

non commented Nov 14, 2016

@sirthias It's @rickynils' call. I was hoping to be able to provide a nicer high-level API (e.g. displaying seeds used for failing properties) but at this point it seems like the compatibility cost for that might be too high.

@non
Copy link
Contributor Author

non commented Nov 14, 2016

(I'll push to update the merge conflicts ASAP.)

@sirthias
Copy link
Contributor

Thanks, Erik!
My impression is that, somehow, people don't miss this repeatable test feature too much (I don't really know why, it's absolutely crucial to me), so this extra bit of API niceness can probably be de-prioritized if there are real downsides.

@non
Copy link
Contributor Author

non commented Nov 14, 2016

@sirthias I agree it's unfortunate. The culprit here is Test.Result, and how results are currently composed (it's difficult to smuggle extra information through results to where the failures are displayed without losing information through combinators like && or ||).

I think we should merge this (to unblock you) since the API here is basically fine. If @rickynils wants to authorize a more major refactor of the high-level test framework I could definitely deliver a PR that displays seeds for failing properties, but it might be at substantial internal churn/cost.

(As a side-note: I put about 6 hours into printing/reading seeds as Base-64 and threading seeds through the framework before hitting this wall, which was a big disappointment. I still have that branch around, but will probably not share it.)

@arosien
Copy link
Contributor

arosien commented Nov 14, 2016

Displaying the seed, if not explicitly passed, is absolutely crucial! Is there a hack to figure out what the seed was?

@non
Copy link
Contributor Author

non commented Nov 14, 2016

@arosien I can imagine providing a Prop combinator to print seeds on failure (independent of other test output). Might be annoying/fragile to use but would work in a pinch.

@sirthias
Copy link
Contributor

sirthias commented Nov 14, 2016

@arosien My approach will be to simply always set the seed manually. Then I always know what the seed was if a test fails. But I'm using ScalaCheck underneath a small custom mini-DSL, not directly, which makes adding such logic easy.

This commit adds an `initialSeed` parameter to the Parameter types
that Test and Gen use to control test execution. When set to `None`,
things behave as they usually do, but when set to `Some(seed)`, that
seed will be used to start the execution.

This makes it easier to configure repeatable tests. In the future, you
could imagine ScalaCheck emitting which seed it started with during a
failing test, and providing an interface for re-running the failing
test with the same seed.
The big problem is that most of Prop's combinators don't communicate
RNG state (or parameter state) between them. This means that things
like nested forAll don't work yet.
As far as I can tell these are working correctly. There was some
trouble with sub-Prop evaluation, where the previous strategy of
"clearing" the initial seed was causing sub-Props to be
non-deterministic.

We're now using Seed#slide to keep sub-Props using a (different)
initialSeed. It's important not to just use the same seed, otherwise
things like `forAll { (x: Int, y: Int) => x == y }` would always be
true.

At this point (barring bugs) I think this feature is working.
@non non force-pushed the topic/initial-seed branch from 3d5a454 to f4da1be Compare November 16, 2016 18:15
@non
Copy link
Contributor Author

non commented Nov 16, 2016

@sirthias @arosien Alright, I think this may work.

The new thing is to define properties with an optional seed. For example:

val encodedSeed = "rXQKWGPSKJEptGJNpblk2_Cc4XpV0mDgBigZu7aiiwK="

// will always test the same x and y
propertyWithSeed("bogus", Some(encodedSeed)) =
  forAll { (x: Int, y: Int) => x == y }

// the equivalent for working directly with a Prop would be
prop.useSeed("bogus", rng.Seed.fromBase64(encodedSeed))

To figure out what seed is being used for a failing case, you can use:

// will print out failing seeds
propertyWithSeed("bogus", None) =
  forAll { (x: Int, y: Int) => x == y }

// the equivalent working directly with a Prop would be
prop.viewSeed("bogus")

(The name is required so that the println from within Prop on failing cases can give you some context.)

There's a whole lot of machinery behind this but I think this is the cleanest interface to use. The strings have to be valid Base-64 and have exactly the right length (256 bits). An easy way to generate random seeds is:

println(rng.Seed.random.toBase64)

What do you all think? @rickynils is this diff too extreme? I may be able to pare it down a bit but not too much here is extraneous.

(Also apologies for the custom Base-64 encoding -- I didn't want to add a dependency and there didn't seem to be another good way to do it that didn't require Java 8.)

@non
Copy link
Contributor Author

non commented Nov 16, 2016

@rickynils Those build failures look like transient timeouts to me.

What do you think of the overall design here?

If we test this property 100 times per test run, we'd expect it to
fail every (1.7M / 100) times, i.e. one in every seventeen-thousand
test runs. That seems like an acceptable level of false positives.
@dwijnand
Copy link
Contributor

dwijnand commented Nov 16, 2016

What about always printing the seed, as soon as it's randomly selected?

Means you have to run all the tests to reproduce, but at least it's something.

@non
Copy link
Contributor Author

non commented Nov 16, 2016

@dwijnand Right now there isn't a fixed top-level seed that controls the entire run. A design like that is possible but would require more changes to how ScalaCheck works (for example, parallelism introduces non-determinism).

We could definitely always print seeds for all failing top-level properties, if that's something @rickynils is interested in. The biggest reason I didn't do that by default was that it wouldn't integrate with the existing output mechanisms. But it would make it super easy to reproduce particular properties.

@rickynils
Copy link
Contributor

@non Second time I apologize for late feedback on this PR... Sorry, just busy with other stuff here. But I will try to play around with this tomorrow. From what I've gathered so far, this looks great. May have more meaningful feedback tomorrow...

@non
Copy link
Contributor Author

non commented Nov 16, 2016

@rickynils Thanks! No problem, take your time. I just wanted to make sure to flag progress (since it was quiet for awhile).

@sirthias
Copy link
Contributor

This looks, great!
Thank you very much again, Erik, for having taken this on!

@rickynils
Copy link
Contributor

@sirthias I can't tell when 1.14.0 will be released with this feature. Since we're probably going to break binary compatibility with 1.13.x, I'd like to not rush the 1.14.0 release.

However, you should maybe be able to use a "nightly" build of 1.14.0 with this feature available before it it is released. I guess that also depends on the availability of compatible ScalaTest builds.

@sirthias
Copy link
Contributor

Thanks, @rickynils! Yes, a nightly build would definitely work for our use case. I'm sure I'll find a way to work around the missing ScalaTest integration.

@rickynils
Copy link
Contributor

@non The initialSeed stuff you've added to Test.Parameters, is that work in progress? Because it doesn't seem to be used right now?

Maybe this is simply what you mean with:

Right now there isn't a fixed top-level seed that controls the entire run

If we disregard parallelism, what would you say is required to get the top-level seed working?

Btw, for a top-level seed to make sense we probably also have to introduce a concept of "fixed-size" test runs. Because a failure will probably only be reproducible with both the seed and the size that was used.

@non
Copy link
Contributor Author

non commented Nov 18, 2016

@rickynils You're right, I left that in but it's currently unused. I should probably remove it from this PR.

For a top-level seed to work, here's what I think we'd need:

  1. deterministic test order (or deterministic sub-seeds based on hashes or something)
  2. either fixed size, or size determined by the seed at the start
  3. property results returning the seed they started with (and ended up with)
  4. require pretty much everything (props, properties, etc) to take a seed to work with, instead of what they do now.

The current thing is a compromise that was relatively easy to plug in but also does what's needed. It would also be possible to have every top-level Prop print a seed on failure (as I mentioned) which would make every top-level Prop failure easy to reproduce.

If you want I can try to open a new PR that changes the API more drastically to get a top-level seed working.

@rickynils
Copy link
Contributor

@non I think having top-level seeds makes this feature more approachable. Messing around a bit with the API is probably worth it (and I think we must have a discussion about binary compatibility management whatever we do anyway. We must allow for changes somehow). With that said, the end-user API (the property combinators) shouldn't change really?

If you want to merge the current code as a first phase, or start out fresh with less restrictions, I'm fine either way. It's your call.

While we're on the subject, I assume the test runner (org.scalacheck.Test.check) will get some more responsibilities (like controlling seeds between different evaluations of a prop). Maybe we should let it take control over shrinking too? What if we implement shrinking solely as a function of the size parameter? If that is possible/practical, it would solve a lot of problems. The job of the test runner would then be to hunt down a pair of (seed,size), and re-running tests wouldn't also mean re-shrinking the test cases.

@non
Copy link
Contributor Author

non commented Nov 21, 2016

@rickynils OK, that sounds good. Either approach works for me (the current failures seem transient). I'll start working on it on a branch off this one.

I don't think we have to worry about shrinking -- that is, if a given property starts with the same seed and parameters, I think it will shrink deterministically anyway. If you think changing how shrinking works is important for other reasons I'm fine with it, but I don't think we need to do that to make a top-level seed (and top-level seed reporting) work.

@non
Copy link
Contributor Author

non commented Nov 21, 2016

(It's worth saying too that the current way we shrink collections might have some advantages over just regenerating completely new data.)

@rickynils
Copy link
Contributor

@non You are correct about shrinking. It is orthogonal to this. And yeah, we would lose something valuable we have now (finding the "structure" that falsifies a property). I have to think a bit deeper about it.

I'll merge this, it'll make future merges easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants