-
-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce randomness in doctests #35522
Comments
Maybe codecov could be run with --random-seed=0 so it is more deterministic. Also, is there a way you can setup the |
I'm not sure about the other implications the sage/.github/workflows/build.yml Line 94 in 97b45d8
There is https://docs.codecov.com/docs/commit-status#informational, but I'm not sure if this is a good idea. The coverage check is not a "required check" for merging. And only other devs can judge if a missing line coverage is acceptable or not. I agree a "warning " status would be better than a complete red failure - but sadly, github only has a binary concept of pass or failure. |
I agree with setting "informational" flag on. The failure in |
Can we try out the "random-seed" fix first? A PR should almost never decrease the code coverage, so I think its a good idea to have the |
It does, if I only add lines that are visited by |
As I understand it, (1) We want to give positive review only to a PR that passes all checks. Then we have choices: (a) If we want to keep the same less stringent level of code coverage as in the trac era, make both codecov/project and codecov/patch informational. (b) If we aim for full code coverage only for the patch, make codecov/project informational. (c) If we aim for full code coverage for both the project and the patch, do nothing but add more doctests. (d) If we aim for full code coverage for both the project and the patch, do only the "random-seed" fix. I think the random-seed doctesting was introduced for code robustness for varied inputs, which is the same aim with the full code coverage. So it seems to me that (d) is self-contradictory. |
Also, it seems to be comparing only percentages, so it's not quite that useful and easy to confuse. If a patch adds coverage for some lines, then other lines that are no longer covered will not be found. It'd be much more useful if we get a report on which lines were covered before but are not covered now. And if we keep it random (with the check informational for now) we might be able to figure out which lines randomly stop being tested, etc. |
codecov works basically by counting lines that are run by doctests and dividing it by the number of all lines. If there are lines added by a patch that are not run for the doctests, the coverage is sure to decrease. So not to decrease, we need many tests to cover all code paths. We didn't do this in the trac era. If we want to change the policy, I think we need sage-devel discussion (or voting...) |
My point was: in a given PR there could be some new lines of code that are not covered by tests, or some tests are modified in such a way that old lines of code stop being covered. This in principle is what the test wants to catch. But maybe the same PR also adds some new coverage (for a different set of lines of code). Then the percentage may be the same, but the PR does not maintain coverage. IMO there are two tests that should be made:
|
I see. Yes.
+1 |
Isn't the constantly-failing codecode/project a CI breakage? Why not apply a quick fix first, and then take a good time to do the "random-seed" fix right? On the other hand, I am not sure of the "random-seed" fix. There is really nothing broken with the (old) random-seed doctests. The problem is with our (new) code coverage tool. |
I am testing the quick fix #35544. I don't know how it works yet. |
Sorry for not being clear. Sure, right now it's not working as it should be and there is room for improvement. What I meant is that after the random tests being fixed, the project test coverage should almost never decrease. At least I cannot come up with a scenario where a decrease is expected. More likely is that a change by accident disables some tests or code branches, and you do want to be warned about such situations, ideally by a failing check. |
While not perfect, the codecov checks are trying to do exactly this. a) is covered by the "/patch" check and b) by the "/project" check. |
There are quite a few reports about "random" doctest failures that disappear when run again. That codecov picks up on these instances, is a symptom of the same issue. In my opinion randomness is not a good tool to test different test inputs, except for in very exceptional situations. |
Do you mean doctests that randomly fail? These are different from random doctests, which take random inputs. If a random doctest randomly fails, then the doctest is doing its job. No? |
Sometimes yes, sometimes you just hit an edge case that was not accounted for when designed the test input (e.g. #35449). There are a few instances where random inputs are a good idea (advantages and disadvantages are for examples briefly discussed at https://en.wikipedia.org/wiki/Monkey_testing). But usually you would like to complement them by static tests that cover all relevant code branches: say a random-input test discovers a bug, then you add the input for that test as a static test to make sure the bug doesn't resurface in the future. The flaky codecov reports suggests that some random tests often hit very different code paths, which are not covered by static-input tests. But ideally you would only rarely hit paths that are not covered by static tests. |
Sorry, but I still don't get why this is better than the "random-seed" fix (nor why there is need for a 'quick fix' instead of a proper fix). |
Because meanwhile all PR are getting FAIL CI. Top priority should be that CI works always so the state of a PR is meaningful. The coverage check is really useful, as long as it's possible to make it pass in a sensible and deterministic way! Otherwise we will just learn to ignore (and hiding other failures, because the state of a PR is a single bit which is the logical and of all checks). |
I learned to ignore "codecov/project" check. |
<!-- Please provide a concise, informative and self-explanatory title. --> <!-- Don't put issue numbers in the title. Put it in the Description below. --> <!-- For example, instead of "Fixes #12345", use "Add a new method to multiply two integers" --> ### 📚 Description Related to #35522 and #35544. Should at least fix the changed codecov between runs. <!-- Describe your changes here in detail. --> <!-- Why is this change required? What problem does it solve? --> <!-- If this PR resolves an open issue, please link to it here. For example "Fixes #12345". --> <!-- If your change requires a documentation PR, please link it appropriately. --> ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. It should be `[x]` not `[x ]`. --> - [x] The title is concise, informative, and self-explanatory. - [x] The description explains in detail what this PR is about. - [x] I have linked a relevant issue or discussion. - [ ] I have created tests covering the changes. - [ ] I have updated the documentation accordingly. ### ⌛ Dependencies <!-- List all open PRs that this PR logically depends on - #12345: short description why this is a dependency - #34567: ... --> <!-- If you're unsure about any of these, don't hesitate to ask. We're here to help! --> URL: #35632 Reported by: Tobias Diez Reviewer(s): Matthias Köppe
Is there an existing issue for this?
Problem Description
Currently a few doctests use random input (e.g.
random_element()
), which leads to unpredictable doctest failures (since depending on the input the tests might fail) and unreliable code coverage metrics (since depending on the input different code paths are executed).Proposed Solution
Replace random input by deterministic/static input. If a test requires many different inputs to be convincing, it can be refactored using pytest parameterized tests.
Alternatives Considered
Accepts random test failures and unreliable coverage metrics.
Additional Information
A good starting point to see which tests are particularly affected is to go to https://app.codecov.io/gh/sagemath/sage/pulls, pick a PR at random, and look at the "Indirect changes" tab. These are changes in the coverage metrics that (most likely) are not coming from changes in the PR. For example, comparing https://app.codecov.io/gh/sagemath/sage/pull/35521/indirect-changes and https://app.codecov.io/gh/sagemath/sage/pull/35507/indirect-changes indicates that the tests in the following files are prone to lead different coverage:
and a few more. This can then be compared to the codecov report for the develop branch. For example, this line https://app.codecov.io/gh/sagemath/sage/blob/develop/src/sage/modular/modsym/boundary.py#L1385 is not covered in the develop branch but covered in the the PRs.
The text was updated successfully, but these errors were encountered: