-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support deterministic testing using Parameters. #263
Conversation
The one thing that's a bit ugly is that our |
I'll take a look and see if I can work around these MiMA errors, sorry! |
So I'm not 100% sure but it seems like changing this API might break binary compatibility. @rickynils is this worth putting on a roadmap for 1.14.0? What do you think? I'll try to minimize the disruption (and most of the errors occur on a private inner class) but it's not totally clear we can get this in 1.13.x. |
With these API changes, If I don't supply an explicit initialSeed and scalacheck thus uses a random one, is there a way for me to retrieve which seed was used? |
@sirthias I'll try to see if that can be done. I wanted to avoid a larger change of threading RNG state through the higher-level structures, but that may be necessary to be able to display the seed correctly. |
Thanks again, Erik! |
@sirthias Good to know! I'm still looking into a larger change that threads the seed through the higher level structure, which makes it easier to display the corresponding seed for a failing test/property, but don't have anything working yet. @rickynils Is this something you'd want to see in 1.14? Would you prefer a different approach? Is this something you'd rather not include? |
Gentlemen, It's great that you cross-published 1.12 for Scala 2.12, but unfortunately the only version of Scalatest that's available for 2.12 depends on ScalaCheck 1.13 (and we use ScalaCheck as a sub-dependency of Scalatest). I'm gonna ask @bvenners if it'd be possible to cross-publish Scalatest 2.2.6 for Scala 2.12 but the only real way forward would be to get this PR merged and ScalaCheck 0.14 underway. Thanks again for all the help with this issue! |
@sirthias It's @rickynils' call. I was hoping to be able to provide a nicer high-level API (e.g. displaying seeds used for failing properties) but at this point it seems like the compatibility cost for that might be too high. |
(I'll push to update the merge conflicts ASAP.) |
Thanks, Erik! |
@sirthias I agree it's unfortunate. The culprit here is I think we should merge this (to unblock you) since the API here is basically fine. If @rickynils wants to authorize a more major refactor of the high-level test framework I could definitely deliver a PR that displays seeds for failing properties, but it might be at substantial internal churn/cost. (As a side-note: I put about 6 hours into printing/reading seeds as Base-64 and threading seeds through the framework before hitting this wall, which was a big disappointment. I still have that branch around, but will probably not share it.) |
Displaying the seed, if not explicitly passed, is absolutely crucial! Is there a hack to figure out what the seed was? |
@arosien I can imagine providing a Prop combinator to print seeds on failure (independent of other test output). Might be annoying/fragile to use but would work in a pinch. |
@arosien My approach will be to simply always set the seed manually. Then I always know what the seed was if a test fails. But I'm using ScalaCheck underneath a small custom mini-DSL, not directly, which makes adding such logic easy. |
This commit adds an `initialSeed` parameter to the Parameter types that Test and Gen use to control test execution. When set to `None`, things behave as they usually do, but when set to `Some(seed)`, that seed will be used to start the execution. This makes it easier to configure repeatable tests. In the future, you could imagine ScalaCheck emitting which seed it started with during a failing test, and providing an interface for re-running the failing test with the same seed.
The big problem is that most of Prop's combinators don't communicate RNG state (or parameter state) between them. This means that things like nested forAll don't work yet.
As far as I can tell these are working correctly. There was some trouble with sub-Prop evaluation, where the previous strategy of "clearing" the initial seed was causing sub-Props to be non-deterministic. We're now using Seed#slide to keep sub-Props using a (different) initialSeed. It's important not to just use the same seed, otherwise things like `forAll { (x: Int, y: Int) => x == y }` would always be true. At this point (barring bugs) I think this feature is working.
3d5a454
to
f4da1be
Compare
@sirthias @arosien Alright, I think this may work. The new thing is to define properties with an optional seed. For example: val encodedSeed = "rXQKWGPSKJEptGJNpblk2_Cc4XpV0mDgBigZu7aiiwK="
// will always test the same x and y
propertyWithSeed("bogus", Some(encodedSeed)) =
forAll { (x: Int, y: Int) => x == y }
// the equivalent for working directly with a Prop would be
prop.useSeed("bogus", rng.Seed.fromBase64(encodedSeed)) To figure out what seed is being used for a failing case, you can use: // will print out failing seeds
propertyWithSeed("bogus", None) =
forAll { (x: Int, y: Int) => x == y }
// the equivalent working directly with a Prop would be
prop.viewSeed("bogus") (The name is required so that the There's a whole lot of machinery behind this but I think this is the cleanest interface to use. The strings have to be valid Base-64 and have exactly the right length (256 bits). An easy way to generate random seeds is: println(rng.Seed.random.toBase64) What do you all think? @rickynils is this diff too extreme? I may be able to pare it down a bit but not too much here is extraneous. (Also apologies for the custom Base-64 encoding -- I didn't want to add a dependency and there didn't seem to be another good way to do it that didn't require Java 8.) |
@rickynils Those build failures look like transient timeouts to me. What do you think of the overall design here? |
If we test this property 100 times per test run, we'd expect it to fail every (1.7M / 100) times, i.e. one in every seventeen-thousand test runs. That seems like an acceptable level of false positives.
What about always printing the seed, as soon as it's randomly selected? Means you have to run all the tests to reproduce, but at least it's something. |
@dwijnand Right now there isn't a fixed top-level seed that controls the entire run. A design like that is possible but would require more changes to how ScalaCheck works (for example, parallelism introduces non-determinism). We could definitely always print seeds for all failing top-level properties, if that's something @rickynils is interested in. The biggest reason I didn't do that by default was that it wouldn't integrate with the existing output mechanisms. But it would make it super easy to reproduce particular properties. |
@non Second time I apologize for late feedback on this PR... Sorry, just busy with other stuff here. But I will try to play around with this tomorrow. From what I've gathered so far, this looks great. May have more meaningful feedback tomorrow... |
@rickynils Thanks! No problem, take your time. I just wanted to make sure to flag progress (since it was quiet for awhile). |
This looks, great! |
@sirthias I can't tell when 1.14.0 will be released with this feature. Since we're probably going to break binary compatibility with 1.13.x, I'd like to not rush the 1.14.0 release. However, you should maybe be able to use a "nightly" build of 1.14.0 with this feature available before it it is released. I guess that also depends on the availability of compatible ScalaTest builds. |
Thanks, @rickynils! Yes, a nightly build would definitely work for our use case. I'm sure I'll find a way to work around the missing ScalaTest integration. |
@non The Maybe this is simply what you mean with:
If we disregard parallelism, what would you say is required to get the top-level seed working? Btw, for a top-level seed to make sense we probably also have to introduce a concept of "fixed-size" test runs. Because a failure will probably only be reproducible with both the seed and the size that was used. |
@rickynils You're right, I left that in but it's currently unused. I should probably remove it from this PR. For a top-level seed to work, here's what I think we'd need:
The current thing is a compromise that was relatively easy to plug in but also does what's needed. It would also be possible to have every top-level If you want I can try to open a new PR that changes the API more drastically to get a top-level seed working. |
@non I think having top-level seeds makes this feature more approachable. Messing around a bit with the API is probably worth it (and I think we must have a discussion about binary compatibility management whatever we do anyway. We must allow for changes somehow). With that said, the end-user API (the property combinators) shouldn't change really? If you want to merge the current code as a first phase, or start out fresh with less restrictions, I'm fine either way. It's your call. While we're on the subject, I assume the test runner ( |
@rickynils OK, that sounds good. Either approach works for me (the current failures seem transient). I'll start working on it on a branch off this one. I don't think we have to worry about shrinking -- that is, if a given property starts with the same seed and parameters, I think it will shrink deterministically anyway. If you think changing how shrinking works is important for other reasons I'm fine with it, but I don't think we need to do that to make a top-level seed (and top-level seed reporting) work. |
(It's worth saying too that the current way we shrink collections might have some advantages over just regenerating completely new data.) |
@non You are correct about shrinking. It is orthogonal to this. And yeah, we would lose something valuable we have now (finding the "structure" that falsifies a property). I have to think a bit deeper about it. I'll merge this, it'll make future merges easier. |
This commit adds an
initialSeed
parameter to the Parameter typesthat Test and Gen use to control test execution. When set to
None
,things behave as they usually do, but when set to
Some(seed)
, thatseed will be used to start the execution.
This makes it easier to configure repeatable tests. In the future, you
could imagine ScalaCheck emitting which seed it started with during a
failing test, and providing an interface for re-running the failing
test with the same seed.
Also included are some tests to ensure generators are deterministic.
We can add more of these for different types of generators in the future
as needed.
Review by @rickynils and @sirthias.