Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Cogen for arbitrary instances #1666

Merged
merged 1 commit into from
May 15, 2017

Conversation

ceedubs
Copy link
Contributor

@ceedubs ceedubs commented May 13, 2017

Resolves #1605.

This is a replacement for #1619. The approach taken in that PR led to
implicit resolution failures in scala.js.

Resolves typelevel#1605.

This is a replacement for typelevel#1619. The approach taken in that PR led to
implicit resolution failures in scala.js.
Copy link
Contributor

@kailuowang kailuowang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. Much better than the existing ones.

Copy link
Contributor

@edmundnoble edmundnoble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me. Very, very curious how Scala.js can have implicit ambiguities when JVM doesn't.

@ceedubs
Copy link
Contributor Author

ceedubs commented May 14, 2017

@edmundnoble that surprised me too.

@kailuowang
Copy link
Contributor

merge with two sign-off and no objections.

@kailuowang kailuowang merged commit 461781e into typelevel:master May 15, 2017
@kailuowang
Copy link
Contributor

after merging build fails on

  • Kleisli[Option, Int, String].monoid.isEmpty *** FAILED ***
    [info] GeneratorDrivenPropertyCheckFailedException was thrown during property evaluation.
    [info] (Discipline.scala:14)
    [info] Falsified after 1 successful property evaluations.
    [info] Location: (Discipline.scala:14)
    [info] Occurred when passed generated values (
    [info] arg0 = Kleisli()
    [info] )
    [info] Label of failing property:
    [info] (true ?== false) failed

I'm investigating.

@ceedubs
Copy link
Contributor Author

ceedubs commented May 15, 2017

@kailuowang I have been looking into this too. So far I've only seen it failing in the JS build and scala versions less than 2.12.

@ceedubs
Copy link
Contributor Author

ceedubs commented May 15, 2017

@kailuowang one other thing that I noticed is that the isId method that is called in the isEmpty tests may be a little troublesome conceptually when it comes to Kleisli/function instances. It will generate random Kleislis and ensure that we get the same result from calling isEmpty on that Kleisli instance and using the Kleisli Eq instance to compare the generated Kleisli with the result of calling empty on the Kleisli monoid. I think that it's possible (though I suspect extremely unlikely) that we could get inconsistent results on those two calls based on the 50 random values that we happen to pass into the kleisli instances when checking for "equality".

@ceedubs
Copy link
Contributor Author

ceedubs commented May 15, 2017

Could this be some sort of race condition heisenbug in scala.js? I can't seem to recreate it once I put in printlns to try to track down values that it's failing on.

Edit: I was able to recreate it with printlns in place (though it did take many repetitions, so this theory isn't completely ruled out)

@ceedubs
Copy link
Contributor Author

ceedubs commented May 15, 2017

Sorry for the live stream of my thought process while debugging this.

I just realized that a difference here is that for the JVM we are using 50 values to test function equality and for JS we are only using 5. So I think that the real issue here is just what I brought up with #1666 (comment) and it's not as unlikely as I thought when I posted that comment since we are only using 5 values in this case.

So essentially the problem here is that with our testing Eq instance that we use for Kleisli, there's a chance that the Monoid instance for Kleisli returns true for isEmpty, even if the Kleisli doesn't always return an empty value. This is an issue with our tests and not the Kleisli Monoid itself, but I don't know of a particularly good solution to it.

@kailuowang
Copy link
Contributor

kailuowang commented May 15, 2017

@ceedubs the law is basically testing isEmpty being consistent with empty. In Monoid the isEmpty is using Eq[A], for Kleisli, it means that it requires an Eq instance for Kleisli, i.e. an Eq for A => F[B] , there simply isn't a true Eq for Functions, so I think the isEmpty method is simply not available for Monoid[Kleisli[F, A, B] (and there is no way to override that method in the Kleisli instance). We were able to test this because we have a proximate Eq for Functions, but I don't think we should be testing it against laws. Thus we should not be testing this law against Kleisli. WDYT?

@ceedubs
Copy link
Contributor Author

ceedubs commented May 15, 2017

Thus we should be testing this law against Kleisli.

@kailuowang sorry I don't think that I followed you there at the end. We are testing this law against Kleisli aren't we?

@kailuowang
Copy link
Contributor

kailuowang commented May 15, 2017

I meant simply put if there is no Eq for kleisli, we shouldn't be testing isEmpty which isn't available for kleisli.

@ceedubs
Copy link
Contributor Author

ceedubs commented May 15, 2017

@kailuowang I see. It's currently part of the monoid laws, so you would need to remove it from there to not test it for Kleisli. You can't call the monoid laws without an Eq instance, so I guess it kind of makes sense that they would test isEmpty as well.

Another potential thing that you could do is change the isEmpty law to check that Monoid[A].isEmpty(Monoid[A].empty) is true without passing arbitrary A values in and performing the comparison. This would change the semantics a bit in that it would no longer enforce that there's only "one" empty value for a monoid.

@kailuowang
Copy link
Contributor

I was just thinking should empty be unique?

@edmundnoble
Copy link
Contributor

I think empty has to be unique, by contradiction, as @alexknvl stated in the gitter:
e1 === e1 combine e2 === e2.

@kailuowang
Copy link
Contributor

I agree that empty has to be unique by law.
After further inspection of the tests, it seems to me that the test is basically testing the two separatedly generated instances of Eq[A => F[B]] being consistent with each other. This Eq in tests is implemented by passing a number of arbitrary input and comparing the outputs.
On scalajs, the number is only 5, which makes it subject to false positive, i.e. two different functions having a chance to return the same result for 5 inputs and thus become a false positive equal. So the test fails when one Eq instance report false equal while the other doesn't. I verified this hypothesis by turning the number to 3, which made the test fail consistently.
For a short-term solution, I propose we increase the number in scalajs to 10. I just created a PR #1671 to test how it behave on travis.

@edmundnoble
Copy link
Contributor

In the long term, Eq for Function1 is giving us false positives, which translate to false negatives on the left side of an implication. Should we discourage negation of the Eq instance for this reason, or remove it altogether? I don't think false negatives are acceptable given property-based testing's already mammoth issues with reproducibility.

@ceedubs
Copy link
Contributor Author

ceedubs commented May 16, 2017

Thanks @kailuowang. I think that bumping up the number of checks for the JS build is a pretty reasonable solution, at least for the short-to-medium- term.

@edmundnoble I agree that this isn't great. Just to make sure that we are on the same page, you know that the Eq instance only exists in the test scope, right? The problem with removing it is that then we wouldn't be able to test laws on our Function1/Kleisli/State instances at all, which seems worse to me than longer build times and possible occasional build failures due to false positives. I would love to find a more elegant solution, though.

@ceedubs ceedubs deleted the issue-1605-take-3 branch May 16, 2017 14:23
@edmundnoble
Copy link
Contributor

Yeah, I'm aware it's only in test scope. My only issue is with negating the Eq causing false positives to translate to false negatives. The issue is with our usage of the Eq instance.

@ceedubs
Copy link
Contributor Author

ceedubs commented May 16, 2017

@edmundnoble could you please expand upon what you mean by "negating the Eq"? You've said that a couple of times now, and I don't think that I understand it.

@hobwekiva
Copy link

hobwekiva commented May 16, 2017

When you check

P(f, g) = f === g

using Eq[A => B], it's the same as just adding an extra parameter like this:

P(f, g, a) = f(a) === g(a)

At worst you might get some false positives (f is not the same as g, but on your test set they are the same).

Now consider

P(f, g) = !(f === g)

this will instead produce false negatives (f is not the same as g, but on your test set they coincide)

There are other boolean expressions (apart from negation) that you can not use on results of === in tests. For instance, consider if (f === g) P(f, g) else R(f, g). And it is not limited to functions either.

@kailuowang
Copy link
Contributor

Thanks @alexknvl

@ceedubs
Copy link
Contributor Author

ceedubs commented May 17, 2017

Thanks @alexknvl. @edmundnoble is this something that we are doing in our tests? Also I guess the hope would be that with enough checks over enough rng seeds (a new one each build), false negatives would eventually be uncovered.

@kailuowang
Copy link
Contributor

To help facilitate the discussion. Here is the code
The test is Rules.isId("isEmpty", A.empty)(A.isEmpty)
which translate to

forAll {( x: A) => Eq[A].eqv(x, A.empty) ?== A.isEmpty(x) }

The default implementation of A.isEmpty is

Eq[A].eqv(a, A.empty)

So for instances that don't override the default implementation this test is equvilent to

forAll { ( x: A) => 
 Eq[A].eqv(x, A.empty) ?== Eq[A].eqv(x, A.empty)
}

@kailuowang kailuowang modified the milestone: 1.0.0-MF May 18, 2017
ceedubs added a commit to ceedubs/cats that referenced this pull request Oct 21, 2018
This is a work in progress and there is a bit more work that should
probably be done before merging this. However, I have already put a fair
amount of time into this and wanted to see what people thought about it
before pushing through to do all of the relevant work.

Cats has a lot of instances for function-like types. When we go to check
the laws for these types, we are required to supply an `Eq` instance.
But defining equality for functions is awkward. So far we've been
approaching this by generating a bunch of values and passing them into
both functions and ensuring that we get the same results from both. This
can produce false positives if two functions differ but we just happen
to sample values that produce the same output in both. For some
purposes, it isn't a big deal if we get some occasional false positives,
because over many runs of the tests with different RNG seeds, we should
eventually catch any discrepancies.

But here be dragons. Some tests use the results of these equality checks
on the left side of an implication, so a false positive in the equality
check becomes a false _negative_ (and thus a build failure) in the test result.
See [here](typelevel#1666 (comment)) for further discussion.

This is where my adventure with this PR begins. Cats builds have been
timing out quite a bit recently, so I tried to reduce the number of
random values that we sample when comparing two functions for equality.
While this did speed up the build a little bit, it started leading to a
much higher frequency of build failures due to false negatives in tests.

So I started to rethink how we determine function equivalence. Instead
of relying on nondeterministic behavior for equality, we can only
provide function equality for functions whose domains are small enough
to exhaustively check. If two functions produce the same output for the
entirety of their domain, they are equivalent.

I've introduced an `ExhaustiveCheck[A]` type class that is similar to
`Gen[A]` but produces a `Stream[A]` of the entire domain of `A`. I made
the name somewhat specific to tests as opposed to something like
`Finite[A]`, because types like `Int` have a finite domain but would be
prohibitively expensive to exhaustively check in tests and therefore
shouldn't have an instance for this type class.

I also added some `Eq` instances for function-like types whose domains
have `ExhaustiveCheck` instances. For the sake of compatibility, I
didn't remove the old `Eq` instances, but I've put them in a lower
implicit priority scope, and I've changed the sites that were using them
to use the new instances (well not quite all of them yet -- that's why
this PR isn't quite complete yet).

The main benefits of this change as I see it are:

1. Remove some nondeterministic behavior from the build.
2. Allow for shrinking of the number of values checked to improve build
times without triggering build failures.
3. Allow for future deprecation of some problematic instances that are
exposed through cats-laws but that users should probably not depend on.

The main potential downside that I can think of is that we might be
checking 15 examples where we were checking 50 before, which could be
considered a reduction in test coverage. However, I think that all of
the places where this sort of approach is used are parametric on the
type, so I don't think that it should matter much that the domain for
this type is much smaller.

Let me know what you think. If people like this approach then I can
switch over the remaining bits.
ceedubs added a commit to ceedubs/cats that referenced this pull request Oct 21, 2018
This is a work in progress and there is a bit more work that should
probably be done before merging this. However, I have already put a fair
amount of time into this and wanted to see what people thought about it
before pushing through to do all of the relevant work.

Cats has a lot of instances for function-like types. When we go to check
the laws for these types, we are required to supply an `Eq` instance.
But defining equality for functions is awkward. So far we've been
approaching this by generating a bunch of values and passing them into
both functions and ensuring that we get the same results from both. This
can produce false positives if two functions differ but we just happen
to sample values that produce the same output in both. For some
purposes, it isn't a big deal if we get some occasional false positives,
because over many runs of the tests with different RNG seeds, we should
eventually catch any discrepancies.

But here be dragons. Some tests use the results of these equality checks
on the left side of an implication, so a false positive in the equality
check becomes a false _negative_ (and thus a build failure) in the test result.
See [here](typelevel#1666 (comment)) for further discussion.

This is where my adventure with this PR begins. Cats builds have been
timing out quite a bit recently, so I tried to reduce the number of
random values that we sample when comparing two functions for equality.
While this did speed up the build a little bit, it started leading to a
much higher frequency of build failures due to false negatives in tests.

So I started to rethink how we determine function equivalence. Instead
of relying on nondeterministic behavior for equality, we can only
provide function equality for functions whose domains are small enough
to exhaustively check. If two functions produce the same output for the
entirety of their domain, they are equivalent.

I've introduced an `ExhaustiveCheck[A]` type class that is similar to
`Gen[A]` but produces a `Stream[A]` of the entire domain of `A`. I made
the name somewhat specific to tests as opposed to something like
`Finite[A]`, because types like `Int` have a finite domain but would be
prohibitively expensive to exhaustively check in tests and therefore
shouldn't have an instance for this type class.

I also added some `Eq` instances for function-like types whose domains
have `ExhaustiveCheck` instances. For the sake of compatibility, I
didn't remove the old `Eq` instances, but I've put them in a lower
implicit priority scope, and I've changed the sites that were using them
to use the new instances (well not quite all of them yet -- that's why
this PR isn't quite complete yet).

The main benefits of this change as I see it are:

1. Remove some nondeterministic behavior from the build.
2. Allow for shrinking of the number of values checked to improve build
times without triggering build failures.
3. Allow for future deprecation of some problematic instances that are
exposed through cats-laws but that users should probably not depend on.

The main potential downside that I can think of is that we might be
checking 15 examples where we were checking 50 before, which could be
considered a reduction in test coverage. However, I think that all of
the places where this sort of approach is used are parametric on the
type, so I don't think that it should matter much that the domain for
this type is much smaller.

Let me know what you think. If people like this approach then I can
switch over the remaining bits.
kailuowang pushed a commit that referenced this pull request Apr 2, 2019
)

* Don't depend on random sampling to determine function equivalence

This is a work in progress and there is a bit more work that should
probably be done before merging this. However, I have already put a fair
amount of time into this and wanted to see what people thought about it
before pushing through to do all of the relevant work.

Cats has a lot of instances for function-like types. When we go to check
the laws for these types, we are required to supply an `Eq` instance.
But defining equality for functions is awkward. So far we've been
approaching this by generating a bunch of values and passing them into
both functions and ensuring that we get the same results from both. This
can produce false positives if two functions differ but we just happen
to sample values that produce the same output in both. For some
purposes, it isn't a big deal if we get some occasional false positives,
because over many runs of the tests with different RNG seeds, we should
eventually catch any discrepancies.

But here be dragons. Some tests use the results of these equality checks
on the left side of an implication, so a false positive in the equality
check becomes a false _negative_ (and thus a build failure) in the test result.
See [here](#1666 (comment)) for further discussion.

This is where my adventure with this PR begins. Cats builds have been
timing out quite a bit recently, so I tried to reduce the number of
random values that we sample when comparing two functions for equality.
While this did speed up the build a little bit, it started leading to a
much higher frequency of build failures due to false negatives in tests.

So I started to rethink how we determine function equivalence. Instead
of relying on nondeterministic behavior for equality, we can only
provide function equality for functions whose domains are small enough
to exhaustively check. If two functions produce the same output for the
entirety of their domain, they are equivalent.

I've introduced an `ExhaustiveCheck[A]` type class that is similar to
`Gen[A]` but produces a `Stream[A]` of the entire domain of `A`. I made
the name somewhat specific to tests as opposed to something like
`Finite[A]`, because types like `Int` have a finite domain but would be
prohibitively expensive to exhaustively check in tests and therefore
shouldn't have an instance for this type class.

I also added some `Eq` instances for function-like types whose domains
have `ExhaustiveCheck` instances. For the sake of compatibility, I
didn't remove the old `Eq` instances, but I've put them in a lower
implicit priority scope, and I've changed the sites that were using them
to use the new instances (well not quite all of them yet -- that's why
this PR isn't quite complete yet).

The main benefits of this change as I see it are:

1. Remove some nondeterministic behavior from the build.
2. Allow for shrinking of the number of values checked to improve build
times without triggering build failures.
3. Allow for future deprecation of some problematic instances that are
exposed through cats-laws but that users should probably not depend on.

The main potential downside that I can think of is that we might be
checking 15 examples where we were checking 50 before, which could be
considered a reduction in test coverage. However, I think that all of
the places where this sort of approach is used are parametric on the
type, so I don't think that it should matter much that the domain for
this type is much smaller.

Let me know what you think. If people like this approach then I can
switch over the remaining bits.

* Remove ExhaustiveCheck.map method

* Fix InvariantMonoidal tests

* Fix tests with failing implicit resolution
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants