Take testing seriously #73

mcabbott · 2022-01-12T22:49:46Z

This adds many tests from Zygote. I've done some filtering to leave out the most repetitive ones, things which clearly don't apply (like Params), and things testing rules for external packages.

All which didn't pass are marked broken... ~~with (the local version of) #68. Thus many will fail.~~ Some have the error copied next to them. Exactly two have SILENTLY WRONG ANSWER in the comment (and should probably become issues) but this is not automatically distinguished from just an error.

~~Probably we should move the existing tests here into another file or two, to be a bit more organised.~~ (done) ~~I also had somewhere a few test cases of rules in ChainRules which are meant to support higher derivatives, but not very well tested there to do so.~~ (some added)

Closes #67, by adding the minimal piracy needed to allow Pairs to have a NamedTuple gradient (since many others tests failed without this).

Keno · 2022-01-12T22:57:17Z

ForwardDiff and TaylorSeries are also sources of tests since we have and want to explicitly support forward mode.

mcabbott · 2022-01-12T23:02:20Z

Good point. Ideally as many as possible of these would run all ways. The extent of my attempt at this so far is these lines , which ran into #70 .

Edit: All of ForwardDiff's tests are copied here, and have loops set up to run them with all combinations of reverse jacobain over forward Hessian, etc. But the exotic combinations mostly don't work at the moment.

mcabbott · 2022-01-15T19:26:37Z

These tests really want a different @test_broken macro, which accepts an error as a broken test, but fails on false -- a silently wrong answer is much worse than an error. Has anyone got one in their desk drawer?

ToucheSir · 2022-06-29T17:22:41Z

I looked into this a while back for an unrelated issue, but it seems the functionality behind those macros is not terribly extensible. Perhaps worthy of creating a TestMacros or TestExtras package if one doesn't already exist?

codecov-commenter · 2022-09-18T18:42:27Z

Codecov Report

Merging #73 (1250a47) into main (55d2871) will increase coverage by 2.42%.
The diff coverage is 87.50%.

@@            Coverage Diff             @@
##             main      #73      +/-   ##
==========================================
+ Coverage   55.54%   57.96%   +2.42%     
==========================================
  Files          21       21              
  Lines        2157     2172      +15     
==========================================
+ Hits         1198     1259      +61     
+ Misses        959      913      -46

Impacted Files	Coverage Δ
src/runtime.jl	`78.26% <66.66%> (-1.74%)`	⬇️
src/extra_rules.jl	`57.98% <100.00%> (+10.27%)`	⬆️
src/stage1/recurse.jl	`96.25% <0.00%> (-0.86%)`	⬇️
src/interface.jl	`70.90% <0.00%> (ø)`
src/stage1/recurse_fwd.jl	`94.28% <0.00%> (ø)`
src/stage1/generated.jl	`78.47% <0.00%> (+0.19%)`	⬆️
src/stage1/forward.jl	`80.13% <0.00%> (+5.98%)`	⬆️
src/tangent.jl	`60.00% <0.00%> (+25.95%)`	⬆️
src/stage1/broadcast.jl	`92.30% <0.00%> (+92.30%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

oscardssmith · 2023-02-13T22:45:12Z

should this be updated and merged?

mcabbott · 2023-02-14T03:00:27Z

Will circle back, but last time I tried to update this, so many things were broken that I stopped.

oscardssmith · 2023-02-14T03:03:35Z

makes sense. Right now first order scalar forward mode is probably the thing that is most reliable.

staticfloat · 2023-02-27T22:14:40Z

These tests really want a different @test_broken macro, which accepts an error as a broken test, but fails on false -- a silently wrong answer is much worse than an error. Has anyone got one in their desk drawer?

For what it's worth, this was fixed in JuliaLang/julia#47804, which will ship in Julia v1.10

mcabbott · 2023-02-27T22:59:03Z

Thanks. But this isn't quite what's wanted here, which is to prioritise silently wrong answers. It does this:

julia> @test_broken error()  # this I want to accept -- broken functionality
Test Broken
  Expression: error()

julia> @test_broken false  # this I would like to fail -- it's a silently wrong gradient
Test Broken
  Expression: false

julia> @test_broken 1  # this changed in 47804, ok
Error During Test at REPL[832]:1
  Expression evaluated to non-Boolean

staticfloat · 2023-02-27T23:08:10Z

You don't want @test_broken then, you want @test_throws:

julia> @test_throws ErrorException error()
Test Passed
      Thrown: ErrorException

julia> @test_throws ErrorException false
Test Failed at REPL[5]:1
  Expression: false
    Expected: ErrorException
  No exception thrown
ERROR: There was an error during testing

julia> @test_throws ErrorException 1
Test Failed at REPL[6]:1
  Expression: 1
    Expected: ErrorException
  No exception thrown
ERROR: There was an error during testing

mcabbott · 2023-02-27T23:14:36Z

Maybe. My hope was that an upgrade which changes an error to a correct answer would cause warnings (to fix at leisure) not failure, but changing an error to false would cause warnings (as now you have silently wrong answers). Maybe describing that as a variant of @test_broken is weird, since "Unexpected Pass" is a failure not a warning.

mcabbott force-pushed the tests branch from 279be12 to 9928324 Compare January 12, 2022 22:52

mcabbott force-pushed the tests branch 2 times, most recently from fe14fa2 to bfef4f6 Compare January 15, 2022 17:23

mcabbott force-pushed the tests branch 6 times, most recently from 7f06b80 to b72f72b Compare September 18, 2022 18:21

mcabbott force-pushed the tests branch from b72f72b to 1250a47 Compare September 18, 2022 19:07

mcabbott marked this pull request as ready for review September 18, 2022 19:10

mcabbott added 5 commits December 27, 2022 22:11

adapt many tests from Zygote

8215253

add tests from ForwardDiff

4484888

fixes

0840a82

move all old tests

f2a8a77

fixes + tests

2b679ea

mcabbott force-pushed the tests branch from 1250a47 to 2b679ea Compare December 28, 2022 03:53

nmheim mentioned this pull request Jan 29, 2024

Start testing reverse mode #254

Merged

mcabbott closed this Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take testing seriously #73

Take testing seriously #73

mcabbott commented Jan 12, 2022 •

edited

Loading

Keno commented Jan 12, 2022

mcabbott commented Jan 12, 2022 •

edited

Loading

mcabbott commented Jan 15, 2022

ToucheSir commented Jun 29, 2022

codecov-commenter commented Sep 18, 2022 •

edited

Loading

oscardssmith commented Feb 13, 2023

mcabbott commented Feb 14, 2023

oscardssmith commented Feb 14, 2023

staticfloat commented Feb 27, 2023 •

edited

Loading

mcabbott commented Feb 27, 2023

staticfloat commented Feb 27, 2023

mcabbott commented Feb 27, 2023

Take testing seriously #73

Take testing seriously #73

Conversation

mcabbott commented Jan 12, 2022 • edited Loading

Keno commented Jan 12, 2022

mcabbott commented Jan 12, 2022 • edited Loading

mcabbott commented Jan 15, 2022

ToucheSir commented Jun 29, 2022

codecov-commenter commented Sep 18, 2022 • edited Loading

Codecov Report

oscardssmith commented Feb 13, 2023

mcabbott commented Feb 14, 2023

oscardssmith commented Feb 14, 2023

staticfloat commented Feb 27, 2023 • edited Loading

mcabbott commented Feb 27, 2023

staticfloat commented Feb 27, 2023

mcabbott commented Feb 27, 2023

mcabbott commented Jan 12, 2022 •

edited

Loading

mcabbott commented Jan 12, 2022 •

edited

Loading

codecov-commenter commented Sep 18, 2022 •

edited

Loading

staticfloat commented Feb 27, 2023 •

edited

Loading