Reduce randomness in doctests #35522

tobiasdiez · 2023-04-16T05:51:58Z

Is there an existing issue for this?

I have searched the existing issues for a bug report that matches the one I want to file, without success.

Problem Description

Currently a few doctests use random input (e.g. random_element()), which leads to unpredictable doctest failures (since depending on the input the tests might fail) and unreliable code coverage metrics (since depending on the input different code paths are executed).

Proposed Solution

Replace random input by deterministic/static input. If a test requires many different inputs to be convincing, it can be refactored using pytest parameterized tests.

Alternatives Considered

Accepts random test failures and unreliable coverage metrics.

Additional Information

A good starting point to see which tests are particularly affected is to go to https://app.codecov.io/gh/sagemath/sage/pulls, pick a PR at random, and look at the "Indirect changes" tab. These are changes in the coverage metrics that (most likely) are not coming from changes in the PR. For example, comparing https://app.codecov.io/gh/sagemath/sage/pull/35521/indirect-changes and https://app.codecov.io/gh/sagemath/sage/pull/35507/indirect-changes indicates that the tests in the following files are prone to lead different coverage:

and a few more. This can then be compared to the codecov report for the develop branch. For example, this line https://app.codecov.io/gh/sagemath/sage/blob/develop/src/sage/modular/modsym/boundary.py#L1385 is not covered in the develop branch but covered in the the PRs.

The text was updated successfully, but these errors were encountered:

tornaria · 2023-04-18T00:54:00Z

Maybe codecov could be run with --random-seed=0 so it is more deterministic.

Also, is there a way you can setup the codecov/project check to be just "informational", i.e. report the same as now and offer a link to coodecov, but make the test result a PASS (or netutral if that's an option) so the PR as a whole has a chance to PASS.

tobiasdiez · 2023-04-18T04:00:17Z

Maybe codecov could be run with --random-seed=0 so it is more deterministic.

I'm not sure about the other implications the random-seed has, but sure this would be a workaround. Feel free to open a PR if you think its a good change. The following needs to be changed:

sage/.github/workflows/build.yml

Line 94 in 97b45d8

../sage -python -m coverage run ./bin/sage-runtests --all -p2

Also, is there a way you can setup the codecov/project check to be just "informational", i.e. report the same as now and offer a link to coodecov, but make the test result a PASS (or netutral if that's an option) so the PR as a whole has a chance to PASS.

There is https://docs.codecov.com/docs/commit-status#informational, but I'm not sure if this is a good idea. The coverage check is not a "required check" for merging. And only other devs can judge if a missing line coverage is acceptable or not. I agree a "warning " status would be better than a complete red failure - but sadly, github only has a binary concept of pass or failure.

kwankyu · 2023-04-21T11:00:45Z

is there a way you can setup the codecov/project check to be just "informational", i.e. report the same as now and offer a link to coodecov, but make the test result a PASS (or netutral if that's an option) so the PR as a whole has a chance to PASS.

I agree with setting "informational" flag on. The failure in codecov/project is not directly related with the PR and insignificant. The coverage result (even from codecov/patch) was only "informational" in the trac era.

tobiasdiez · 2023-04-21T15:45:08Z

Can we try out the "random-seed" fix first? A PR should almost never decrease the code coverage, so I think its a good idea to have the /project check fail in case the coverage does decrease.

tornaria · 2023-04-21T22:56:07Z

Can we try out the "random-seed" fix first? A PR should almost never decrease the code coverage, so I think its a good idea to have the /project check fail in case the coverage does decrease.

It does, if I only add lines that are visited by # long time tests.

kwankyu · 2023-04-21T22:58:37Z

As I understand it,

(1) We want to give positive review only to a PR that passes all checks.
(2) It is very easy to decrease codecov/project because of random doctests
(3) It is very easy to decrease codecov/patch because we need many doctests to fully cover all code paths except simple PRs.

Then we have choices:

(a) If we want to keep the same less stringent level of code coverage as in the trac era, make both codecov/project and codecov/patch informational.

(b) If we aim for full code coverage only for the patch, make codecov/project informational.

(c) If we aim for full code coverage for both the project and the patch, do nothing but add more doctests.

(d) If we aim for full code coverage for both the project and the patch, do only the "random-seed" fix.

I think the random-seed doctesting was introduced for code robustness for varied inputs, which is the same aim with the full code coverage. So it seems to me that (d) is self-contradictory.

tornaria · 2023-04-21T23:00:37Z

Also, it seems to be comparing only percentages, so it's not quite that useful and easy to confuse. If a patch adds coverage for some lines, then other lines that are no longer covered will not be found.

It'd be much more useful if we get a report on which lines were covered before but are not covered now. And if we keep it random (with the check informational for now) we might be able to figure out which lines randomly stop being tested, etc.

kwankyu · 2023-04-21T23:16:29Z

Also, it seems to be comparing only percentages, so it's not quite that useful and easy to confuse. If a patch adds coverage for some lines, then other lines that are no longer covered will not be found.

codecov works basically by counting lines that are run by doctests and dividing it by the number of all lines. If there are lines added by a patch that are not run for the doctests, the coverage is sure to decrease. So not to decrease, we need many tests to cover all code paths. We didn't do this in the trac era. If we want to change the policy, I think we need sage-devel discussion (or voting...)

tornaria · 2023-04-21T23:26:59Z

Also, it seems to be comparing only percentages, so it's not quite that useful and easy to confuse. If a patch adds coverage for some lines, then other lines that are no longer covered will not be found.

codecov works basically by counting lines that are run by doctests and dividing it by the number of all lines. If there are lines added by a patch that are not run for the doctests, the coverage is sure to decrease. So not to decrease, we need many tests to cover all code paths. We didn't do this in the trac era. If we want to change the policy, I think we need sage-devel discussion (or voting...)

My point was: in a given PR there could be some new lines of code that are not covered by tests, or some tests are modified in such a way that old lines of code stop being covered. This in principle is what the test wants to catch. But maybe the same PR also adds some new coverage (for a different set of lines of code). Then the percentage may be the same, but the PR does not maintain coverage.

IMO there are two tests that should be made:
a. whether all the new lines of code are covered (here it's good to aim for 100%, but *)
b. whether all the old lines of code that were already covered are still covered.

not sure if we must require coverage with non-long tests... so either we need to run the whole testsuite in long mode, or at least run the modified files in long mode.

kwankyu · 2023-04-22T00:50:33Z

My point was: in a given PR there could be some new lines of code that are not covered by tests, or some tests are modified in such a way that old lines of code stop being covered. This in principle is what the test wants to catch. But maybe the same PR also adds some new coverage (for a different set of lines of code). Then the percentage may be the same, but the PR does not maintain coverage.

I see. Yes.

... or at least run the modified files in long mode.

+1

kwankyu · 2023-04-22T02:16:12Z

Can we try out the "random-seed" fix first? A PR should almost never decrease the code coverage, so I think its a good idea to have the /project check fail in case the coverage does decrease.

Isn't the constantly-failing codecode/project a CI breakage? Why not apply a quick fix first, and then take a good time to do the "random-seed" fix right?

On the other hand, I am not sure of the "random-seed" fix. There is really nothing broken with the (old) random-seed doctests. The problem is with our (new) code coverage tool.

kwankyu · 2023-04-22T02:19:12Z

I am testing the quick fix #35544. I don't know how it works yet.

tobiasdiez · 2023-04-22T04:51:34Z

Can we try out the "random-seed" fix first? A PR should almost never decrease the code coverage, so I think its a good idea to have the /project check fail in case the coverage does decrease.

It does, if I only add lines that are visited by # long time tests.

Sorry for not being clear. Sure, right now it's not working as it should be and there is room for improvement. What I meant is that after the random tests being fixed, the project test coverage should almost never decrease. At least I cannot come up with a scenario where a decrease is expected. More likely is that a change by accident disables some tests or code branches, and you do want to be warned about such situations, ideally by a failing check.

tobiasdiez · 2023-04-22T04:55:09Z

IMO there are two tests that should be made: a. whether all the new lines of code are covered (here it's good to aim for 100%, but *) b. whether all the old lines of code that were already covered are still covered.

While not perfect, the codecov checks are trying to do exactly this. a) is covered by the "/patch" check and b) by the "/project" check.

tobiasdiez · 2023-04-22T04:59:16Z

On the other hand, I am not sure of the "random-seed" fix. There is really nothing broken with the (old) random-seed doctests. The problem is with our (new) code coverage tool.

There are quite a few reports about "random" doctest failures that disappear when run again. That codecov picks up on these instances, is a symptom of the same issue. In my opinion randomness is not a good tool to test different test inputs, except for in very exceptional situations.

kwankyu · 2023-04-22T06:02:01Z

On the other hand, I am not sure of the "random-seed" fix. There is really nothing broken with the (old) random-seed doctests. The problem is with our (new) code coverage tool.

There are quite a few reports about "random" doctest failures that disappear when run again.

Do you mean doctests that randomly fail? These are different from random doctests, which take random inputs. If a random doctest randomly fails, then the doctest is doing its job. No?

tobiasdiez · 2023-04-22T07:49:49Z

On the other hand, I am not sure of the "random-seed" fix. There is really nothing broken with the (old) random-seed doctests. The problem is with our (new) code coverage tool.

There are quite a few reports about "random" doctest failures that disappear when run again.

Do you mean doctests that randomly fail? These are different from random doctests, which take random inputs. If a random doctest randomly fails, then the doctest is doing its job. No?

Sometimes yes, sometimes you just hit an edge case that was not accounted for when designed the test input (e.g. #35449). There are a few instances where random inputs are a good idea (advantages and disadvantages are for examples briefly discussed at https://en.wikipedia.org/wiki/Monkey_testing). But usually you would like to complement them by static tests that cover all relevant code branches: say a random-input test discovers a bug, then you add the input for that test as a static test to make sure the bug doesn't resurface in the future. The flaky codecov reports suggests that some random tests often hit very different code paths, which are not covered by static-input tests. But ideally you would only rarely hit paths that are not covered by static tests.

kwankyu · 2023-04-22T10:29:02Z

Testing of the quick fix #35544 succeeded -- A PR decreasing codecov/project still passes the check.

My suggestion is to merge #35544 as a quick fix for broken CI.

tobiasdiez · 2023-04-22T10:34:32Z

Sorry, but I still don't get why this is better than the "random-seed" fix (nor why there is need for a 'quick fix' instead of a proper fix).

tornaria · 2023-05-06T14:10:13Z

Sorry, but I still don't get why this is better than the "random-seed" fix (nor why there is need for a 'quick fix' instead of a proper fix).

Because meanwhile all PR are getting FAIL CI. Top priority should be that CI works always so the state of a PR is meaningful.

The coverage check is really useful, as long as it's possible to make it pass in a sensible and deterministic way! Otherwise we will just learn to ignore (and hiding other failures, because the state of a PR is a single bit which is the logical and of all checks).

kwankyu · 2023-05-08T22:47:40Z

I learned to ignore "codecov/project" check.

### 📚 Description Related to #35522 and #35544. Should at least fix the changed codecov between runs.     ### 📝 Checklist  - [x] The title is concise, informative, and self-explanatory. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation accordingly. ### ⌛ Dependencies   URL: #35632 Reported by: Tobias Diez Reviewer(s): Matthias Köppe

tobiasdiez added the t: enhancement label Apr 16, 2023

tobiasdiez mentioned this issue Apr 16, 2023

Fix slow doctests or mark # long time #35443

Merged

2 tasks

tobiasdiez added the t: tests label Apr 16, 2023

kwankyu mentioned this issue Apr 21, 2023

Add "informational: true" to .codecov.yml #35544

Merged

5 tasks

tobiasdiez mentioned this issue May 9, 2023

Use fixed random-seed in the "Build & Test" CI #35632

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce randomness in doctests #35522

Reduce randomness in doctests #35522

tobiasdiez commented Apr 16, 2023

tornaria commented Apr 18, 2023

tobiasdiez commented Apr 18, 2023 •

edited

Loading

kwankyu commented Apr 21, 2023 •

edited

Loading

tobiasdiez commented Apr 21, 2023

tornaria commented Apr 21, 2023

kwankyu commented Apr 21, 2023

tornaria commented Apr 21, 2023

kwankyu commented Apr 21, 2023

tornaria commented Apr 21, 2023

kwankyu commented Apr 22, 2023

kwankyu commented Apr 22, 2023

kwankyu commented Apr 22, 2023

tobiasdiez commented Apr 22, 2023

tobiasdiez commented Apr 22, 2023

tobiasdiez commented Apr 22, 2023

kwankyu commented Apr 22, 2023

tobiasdiez commented Apr 22, 2023

kwankyu commented Apr 22, 2023

tobiasdiez commented Apr 22, 2023

tornaria commented May 6, 2023

kwankyu commented May 8, 2023

Reduce randomness in doctests #35522

Reduce randomness in doctests #35522

Comments

tobiasdiez commented Apr 16, 2023

Is there an existing issue for this?

Problem Description

Proposed Solution

Alternatives Considered

Additional Information

tornaria commented Apr 18, 2023

tobiasdiez commented Apr 18, 2023 • edited Loading

kwankyu commented Apr 21, 2023 • edited Loading

tobiasdiez commented Apr 21, 2023

tornaria commented Apr 21, 2023

kwankyu commented Apr 21, 2023

tornaria commented Apr 21, 2023

kwankyu commented Apr 21, 2023

tornaria commented Apr 21, 2023

kwankyu commented Apr 22, 2023

kwankyu commented Apr 22, 2023

kwankyu commented Apr 22, 2023

tobiasdiez commented Apr 22, 2023

tobiasdiez commented Apr 22, 2023

tobiasdiez commented Apr 22, 2023

kwankyu commented Apr 22, 2023

tobiasdiez commented Apr 22, 2023

kwankyu commented Apr 22, 2023

tobiasdiez commented Apr 22, 2023

tornaria commented May 6, 2023

kwankyu commented May 8, 2023

tobiasdiez commented Apr 18, 2023 •

edited

Loading

kwankyu commented Apr 21, 2023 •

edited

Loading