Support deterministic testing using Parameters. #263

non · 2016-09-10T20:59:39Z

This commit adds an initialSeed parameter to the Parameter types
that Test and Gen use to control test execution. When set to None,
things behave as they usually do, but when set to Some(seed), that
seed will be used to start the execution.

This makes it easier to configure repeatable tests. In the future, you
could imagine ScalaCheck emitting which seed it started with during a
failing test, and providing an interface for re-running the failing
test with the same seed.

Also included are some tests to ensure generators are deterministic.
We can add more of these for different types of generators in the future
as needed.

Review by @rickynils and @sirthias.

non · 2016-09-10T21:04:18Z

The one thing that's a bit ugly is that our rng.Seed has 256-bits of state, but we don't expose a convenient constructor to initialize with four Long values. Since we also don't necessarily show people the seed being used, this is maybe OK. In the long run it might be nice to give Seed a nice way to display itself (maybe Base-64 encoded?) and then have a constructor that works on those values.

non · 2016-09-11T02:50:25Z

I'll take a look and see if I can work around these MiMA errors, sorry!

non · 2016-09-13T01:23:49Z

So I'm not 100% sure but it seems like changing this API might break binary compatibility. @rickynils is this worth putting on a roadmap for 1.14.0? What do you think?

I'll try to minimize the disruption (and most of the errors occur on a private inner class) but it's not totally clear we can get this in 1.13.x.

sirthias · 2016-09-15T10:01:16Z

With these API changes, If I don't supply an explicit initialSeed and scalacheck thus uses a random one, is there a way for me to retrieve which seed was used?

non · 2016-09-19T17:29:25Z

@sirthias I'll try to see if that can be done.

I wanted to avoid a larger change of threading RNG state through the higher-level structures, but that may be necessary to be able to display the seed correctly.

sirthias · 2016-09-20T07:20:34Z

Thanks again, Erik!
The feature of having access to the randomly set seed is no showstopper though, since I can also simply always set my own seed, random or not, to have complete control over everything,

non · 2016-09-21T17:33:48Z

@sirthias Good to know!

I'm still looking into a larger change that threads the seed through the higher level structure, which makes it easier to display the corresponding seed for a failing test/property, but don't have anything working yet.

@rickynils Is this something you'd want to see in 1.14? Would you prefer a different approach? Is this something you'd rather not include?

rickynils · 2016-09-21T18:53:40Z

@non Sorry for the late feedback. I think this looks great, and definitely should be in 1.14. The approach of having the initial seed as part of Parameters was exactly what I had in mind when I opened #67.

sirthias · 2016-11-14T16:33:29Z

Gentlemen,
just for our information: Do you have any indication of when this might make into a release?
ScalaCheck 1.13 isn't usable really usable for us since there is no way to achieve repeatable tests, which was still possible with 1.12.

It's great that you cross-published 1.12 for Scala 2.12, but unfortunately the only version of Scalatest that's available for 2.12 depends on ScalaCheck 1.13 (and we use ScalaCheck as a sub-dependency of Scalatest).

I'm gonna ask @bvenners if it'd be possible to cross-publish Scalatest 2.2.6 for Scala 2.12 but the only real way forward would be to get this PR merged and ScalaCheck 0.14 underway.

Thanks again for all the help with this issue!

non · 2016-11-14T18:01:59Z

@sirthias It's @rickynils' call. I was hoping to be able to provide a nicer high-level API (e.g. displaying seeds used for failing properties) but at this point it seems like the compatibility cost for that might be too high.

non · 2016-11-14T18:06:04Z

(I'll push to update the merge conflicts ASAP.)

sirthias · 2016-11-14T20:22:27Z

Thanks, Erik!
My impression is that, somehow, people don't miss this repeatable test feature too much (I don't really know why, it's absolutely crucial to me), so this extra bit of API niceness can probably be de-prioritized if there are real downsides.

non · 2016-11-14T20:30:34Z

@sirthias I agree it's unfortunate. The culprit here is Test.Result, and how results are currently composed (it's difficult to smuggle extra information through results to where the failures are displayed without losing information through combinators like && or ||).

I think we should merge this (to unblock you) since the API here is basically fine. If @rickynils wants to authorize a more major refactor of the high-level test framework I could definitely deliver a PR that displays seeds for failing properties, but it might be at substantial internal churn/cost.

(As a side-note: I put about 6 hours into printing/reading seeds as Base-64 and threading seeds through the framework before hitting this wall, which was a big disappointment. I still have that branch around, but will probably not share it.)

arosien · 2016-11-14T20:42:37Z

Displaying the seed, if not explicitly passed, is absolutely crucial! Is there a hack to figure out what the seed was?

non · 2016-11-14T20:44:53Z

@arosien I can imagine providing a Prop combinator to print seeds on failure (independent of other test output). Might be annoying/fragile to use but would work in a pinch.

sirthias · 2016-11-14T20:48:21Z

@arosien My approach will be to simply always set the seed manually. Then I always know what the seed was if a test fails. But I'm using ScalaCheck underneath a small custom mini-DSL, not directly, which makes adding such logic easy.

This commit adds an `initialSeed` parameter to the Parameter types that Test and Gen use to control test execution. When set to `None`, things behave as they usually do, but when set to `Some(seed)`, that seed will be used to start the execution. This makes it easier to configure repeatable tests. In the future, you could imagine ScalaCheck emitting which seed it started with during a failing test, and providing an interface for re-running the failing test with the same seed.

The big problem is that most of Prop's combinators don't communicate RNG state (or parameter state) between them. This means that things like nested forAll don't work yet.

As far as I can tell these are working correctly. There was some trouble with sub-Prop evaluation, where the previous strategy of "clearing" the initial seed was causing sub-Props to be non-deterministic. We're now using Seed#slide to keep sub-Props using a (different) initialSeed. It's important not to just use the same seed, otherwise things like `forAll { (x: Int, y: Int) => x == y }` would always be true. At this point (barring bugs) I think this feature is working.

non · 2016-11-16T18:19:11Z

@sirthias @arosien Alright, I think this may work.

The new thing is to define properties with an optional seed. For example:

val encodedSeed = "rXQKWGPSKJEptGJNpblk2_Cc4XpV0mDgBigZu7aiiwK="

// will always test the same x and y
propertyWithSeed("bogus", Some(encodedSeed)) =
  forAll { (x: Int, y: Int) => x == y }

// the equivalent for working directly with a Prop would be
prop.useSeed("bogus", rng.Seed.fromBase64(encodedSeed))

To figure out what seed is being used for a failing case, you can use:

// will print out failing seeds
propertyWithSeed("bogus", None) =
  forAll { (x: Int, y: Int) => x == y }

// the equivalent working directly with a Prop would be
prop.viewSeed("bogus")

(The name is required so that the println from within Prop on failing cases can give you some context.)

There's a whole lot of machinery behind this but I think this is the cleanest interface to use. The strings have to be valid Base-64 and have exactly the right length (256 bits). An easy way to generate random seeds is:

println(rng.Seed.random.toBase64)

What do you all think? @rickynils is this diff too extreme? I may be able to pare it down a bit but not too much here is extraneous.

(Also apologies for the custom Base-64 encoding -- I didn't want to add a dependency and there didn't seem to be another good way to do it that didn't require Java 8.)

non · 2016-11-16T20:13:05Z

@rickynils Those build failures look like transient timeouts to me.

What do you think of the overall design here?

If we test this property 100 times per test run, we'd expect it to fail every (1.7M / 100) times, i.e. one in every seventeen-thousand test runs. That seems like an acceptable level of false positives.

dwijnand · 2016-11-16T21:45:54Z

What about always printing the seed, as soon as it's randomly selected?

Means you have to run all the tests to reproduce, but at least it's something.

non · 2016-11-16T21:49:38Z

@dwijnand Right now there isn't a fixed top-level seed that controls the entire run. A design like that is possible but would require more changes to how ScalaCheck works (for example, parallelism introduces non-determinism).

We could definitely always print seeds for all failing top-level properties, if that's something @rickynils is interested in. The biggest reason I didn't do that by default was that it wouldn't integrate with the existing output mechanisms. But it would make it super easy to reproduce particular properties.

rickynils · 2016-11-16T22:09:15Z

@non Second time I apologize for late feedback on this PR... Sorry, just busy with other stuff here. But I will try to play around with this tomorrow. From what I've gathered so far, this looks great. May have more meaningful feedback tomorrow...

non · 2016-11-16T22:09:56Z

@rickynils Thanks! No problem, take your time. I just wanted to make sure to flag progress (since it was quiet for awhile).

sirthias · 2016-11-17T08:12:31Z

This looks, great!
Thank you very much again, Erik, for having taken this on!

rickynils · 2016-11-18T13:36:53Z

@sirthias I can't tell when 1.14.0 will be released with this feature. Since we're probably going to break binary compatibility with 1.13.x, I'd like to not rush the 1.14.0 release.

However, you should maybe be able to use a "nightly" build of 1.14.0 with this feature available before it it is released. I guess that also depends on the availability of compatible ScalaTest builds.

sirthias · 2016-11-18T13:39:29Z

Thanks, @rickynils! Yes, a nightly build would definitely work for our use case. I'm sure I'll find a way to work around the missing ScalaTest integration.

rickynils · 2016-11-18T14:26:48Z

@non The initialSeed stuff you've added to Test.Parameters, is that work in progress? Because it doesn't seem to be used right now?

Maybe this is simply what you mean with:

Right now there isn't a fixed top-level seed that controls the entire run

If we disregard parallelism, what would you say is required to get the top-level seed working?

Btw, for a top-level seed to make sense we probably also have to introduce a concept of "fixed-size" test runs. Because a failure will probably only be reproducible with both the seed and the size that was used.

non · 2016-11-18T15:16:10Z

@rickynils You're right, I left that in but it's currently unused. I should probably remove it from this PR.

For a top-level seed to work, here's what I think we'd need:

deterministic test order (or deterministic sub-seeds based on hashes or something)
either fixed size, or size determined by the seed at the start
property results returning the seed they started with (and ended up with)
require pretty much everything (props, properties, etc) to take a seed to work with, instead of what they do now.

The current thing is a compromise that was relatively easy to plug in but also does what's needed. It would also be possible to have every top-level Prop print a seed on failure (as I mentioned) which would make every top-level Prop failure easy to reproduce.

If you want I can try to open a new PR that changes the API more drastically to get a top-level seed working.

rickynils · 2016-11-21T15:13:09Z

@non I think having top-level seeds makes this feature more approachable. Messing around a bit with the API is probably worth it (and I think we must have a discussion about binary compatibility management whatever we do anyway. We must allow for changes somehow). With that said, the end-user API (the property combinators) shouldn't change really?

If you want to merge the current code as a first phase, or start out fresh with less restrictions, I'm fine either way. It's your call.

While we're on the subject, I assume the test runner (org.scalacheck.Test.check) will get some more responsibilities (like controlling seeds between different evaluations of a prop). Maybe we should let it take control over shrinking too? What if we implement shrinking solely as a function of the size parameter? If that is possible/practical, it would solve a lot of problems. The job of the test runner would then be to hunt down a pair of (seed,size), and re-running tests wouldn't also mean re-shrinking the test cases.

non · 2016-11-21T15:15:47Z

@rickynils OK, that sounds good. Either approach works for me (the current failures seem transient). I'll start working on it on a branch off this one.

I don't think we have to worry about shrinking -- that is, if a given property starts with the same seed and parameters, I think it will shrink deterministically anyway. If you think changing how shrinking works is important for other reasons I'm fine with it, but I don't think we need to do that to make a top-level seed (and top-level seed reporting) work.

non · 2016-11-21T15:17:18Z

(It's worth saying too that the current way we shrink collections might have some advantages over just regenerating completely new data.)

rickynils · 2016-11-21T15:24:52Z

@non You are correct about shrinking. It is orthogonal to this. And yeah, we would lose something valuable we have now (finding the "structure" that falsifies a property). I have to think a bit deeper about it.

I'll merge this, it'll make future merges easier.

non mentioned this pull request Sep 10, 2016

Remove mutable random number generator in Gen #67

Closed

rickynils mentioned this pull request Sep 26, 2016

filter properties based on a criteria #267

Merged

sirthias mentioned this pull request Nov 14, 2016

Cross-publish 2.2.6 for Scala 2.12 scalatest/scalatest#1011

Open

erik-stripe added 4 commits November 16, 2016 12:25

Support encoding/decoding seeds as Base-64.

24aa986

Prop-level seeding is sort of working.

4f6f62d

The big problem is that most of Prop's combinators don't communicate RNG state (or parameter state) between them. This means that things like nested forAll don't work yet.

non force-pushed the topic/initial-seed branch from 3d5a454 to f4da1be Compare November 16, 2016 18:15

Fix uniform-distribution test.

81a48fb

If we test this property 100 times per test run, we'd expect it to fail every (1.7M / 100) times, i.e. one in every seventeen-thousand test runs. That seems like an acceptable level of false positives.

rickynils merged commit a608637 into typelevel:master Nov 21, 2016

pjfanning mentioned this pull request Jan 22, 2017

release latest master #314

Closed

mdimjasevic mentioned this pull request Jan 23, 2018

1.14.0 release with useSeed #374

Closed

SethTisue mentioned this pull request Apr 24, 2018

ScalaCheck should be printing failing seeds by default #400

Closed

thpani mentioned this pull request Mar 1, 2022

Support TlaFunOper.{funDef,recFunDef} in PBT apalache-mc/apalache#1402

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support deterministic testing using Parameters. #263

Support deterministic testing using Parameters. #263

non commented Sep 10, 2016 •

edited

Loading

non commented Sep 10, 2016

non commented Sep 11, 2016

non commented Sep 13, 2016

sirthias commented Sep 15, 2016

non commented Sep 19, 2016

sirthias commented Sep 20, 2016

non commented Sep 21, 2016

rickynils commented Sep 21, 2016

sirthias commented Nov 14, 2016

non commented Nov 14, 2016

non commented Nov 14, 2016

sirthias commented Nov 14, 2016

non commented Nov 14, 2016 •

edited

Loading

arosien commented Nov 14, 2016

non commented Nov 14, 2016

sirthias commented Nov 14, 2016 •

edited

Loading

non commented Nov 16, 2016 •

edited

Loading

non commented Nov 16, 2016

dwijnand commented Nov 16, 2016 •

edited

Loading

non commented Nov 16, 2016 •

edited

Loading

rickynils commented Nov 16, 2016

non commented Nov 16, 2016

sirthias commented Nov 17, 2016

rickynils commented Nov 18, 2016

sirthias commented Nov 18, 2016

rickynils commented Nov 18, 2016

non commented Nov 18, 2016 •

edited

Loading

rickynils commented Nov 21, 2016

non commented Nov 21, 2016 •

edited

Loading

non commented Nov 21, 2016

rickynils commented Nov 21, 2016

Support deterministic testing using Parameters. #263

Support deterministic testing using Parameters. #263

Conversation

non commented Sep 10, 2016 • edited Loading

non commented Sep 10, 2016

non commented Sep 11, 2016

non commented Sep 13, 2016

sirthias commented Sep 15, 2016

non commented Sep 19, 2016

sirthias commented Sep 20, 2016

non commented Sep 21, 2016

rickynils commented Sep 21, 2016

sirthias commented Nov 14, 2016

non commented Nov 14, 2016

non commented Nov 14, 2016

sirthias commented Nov 14, 2016

non commented Nov 14, 2016 • edited Loading

arosien commented Nov 14, 2016

non commented Nov 14, 2016

sirthias commented Nov 14, 2016 • edited Loading

non commented Nov 16, 2016 • edited Loading

non commented Nov 16, 2016

dwijnand commented Nov 16, 2016 • edited Loading

non commented Nov 16, 2016 • edited Loading

rickynils commented Nov 16, 2016

non commented Nov 16, 2016

sirthias commented Nov 17, 2016

rickynils commented Nov 18, 2016

sirthias commented Nov 18, 2016

rickynils commented Nov 18, 2016

non commented Nov 18, 2016 • edited Loading

rickynils commented Nov 21, 2016

non commented Nov 21, 2016 • edited Loading

non commented Nov 21, 2016

rickynils commented Nov 21, 2016

non commented Sep 10, 2016 •

edited

Loading

non commented Nov 14, 2016 •

edited

Loading

sirthias commented Nov 14, 2016 •

edited

Loading

non commented Nov 16, 2016 •

edited

Loading

dwijnand commented Nov 16, 2016 •

edited

Loading

non commented Nov 16, 2016 •

edited

Loading

non commented Nov 18, 2016 •

edited

Loading

non commented Nov 21, 2016 •

edited

Loading