Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CPU time for --warn-long #36226

Merged
merged 37 commits into from
Oct 26, 2024
Merged

Conversation

orlitzky
Copy link
Contributor

@orlitzky orlitzky commented Sep 9, 2023

Finally address #32981.

Using wall time to measure which tests are "long" doesn't make much sense. Wall time is subject to every conflating factor:

  1. Faster hardware runs tests in less wall time
  2. Faster compiler flags makes code that runs in less wall time
  3. A busy machine will run tests slower than an unburdened one
  4. Operating system priority can increase or decrease the time taken by a process
  5. etc.

Ideally, everyone would agree on how long is too long for a test to run. Attaining that ideal is not so easy, but the first step is to switch--warn-long away from using wall time and towards using CPU time. A second phase (#33022) will normalize the CPU time.

Examples

--warn-long output

The user-facing portion of this can be tested with a low --warn-long value,

$ sage -t --warn-long=1 src/sage/rings/qqbar.py 
...
Doctesting 1 file.
sage -t --warn-long --random-seed=309742757431591512726158538133919135994 src/sage/rings/qqbar.py
**********************************************************************
File "src/sage/rings/qqbar.py", line 511, in sage.rings.qqbar
Warning, slow doctest:
    z, = [r for r in p4.roots(QQbar, False) if r in ival]
Test ran for 1.51s cpu, 1.51s wall
Check ran for 0.00s cpu, 0.00s wall
**********************************************************************
File "src/sage/rings/qqbar.py", line 524, in sage.rings.qqbar
Warning, slow doctest:
    z2 = QQbar.polynomial_root(p4, ival)
Test ran for 1.07s cpu, 1.07s wall
Check ran for 0.00s cpu, 0.00s wall
**********************************************************************
File "src/sage/rings/qqbar.py", line 808, in sage.rings.qqbar.AlgebraicField_common._factor_multivariate_polynomial
Warning, slow doctest:
    F = QQbar._factor_multivariate_polynomial(p)
Test ran for 2.38s cpu, 2.42s wall
Check ran for 0.00s cpu, 0.00s wall

...

pexpect accounting

Two different methods are used to account for the CPU time, one for linux, and one for everything else. This GAP command uses pexpect and should chew up a good bit of CPU time:

sage: from sage.doctest.util import Timer
sage: t = Timer()
sage: _ = t.start(); t.stop()    # no CPU time used between start/stop
{'cputime': 0.0, 'walltime': 7.605552673339844e-05}

sage: _ = t.start(); gap.eval('a:=List([0..10000],i->WordAlp("a",i));; IsSortedList(a);'); t.stop()    # lots of CPU time used
'true'
{'cputime': 30.089999999999996, 'walltime': 36.292088747024536}

@orlitzky orlitzky changed the title Use CPU time for --warn-long (Draft) Use CPU time for --warn-long Sep 9, 2023
@orlitzky orlitzky force-pushed the warn-long-cputime branch 3 times, most recently from 1f56451 to c3d1af6 Compare September 9, 2023 23:09
@orlitzky orlitzky changed the title (Draft) Use CPU time for --warn-long Use CPU time for --warn-long Sep 10, 2023
src/sage/doctest/util.py Outdated Show resolved Hide resolved
@@ -1467,7 +1475,7 @@ def report_failure(self, out, test, example, got, globs):
self._fakeout.start_spoofing()
return returnval

def report_overtime(self, out, test, example, got, *, check_duration=0):
def report_overtime(self, out, test, example, got, *, check_timer=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old keyword should be deprecated using sage.misc.decorators.rename_keyword, rather than just changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keyword wasn't simply renamed -- the type of argument that it takes was changed too. Typically that would be an unfriendly thing to do, but this is essentially a private method. No other part of the sage library uses it (it's referenced exactly once, by self), and since it's part of sage's custom doctest runner, it's hard to imagine a third party relying on these internal details. Frankly I'd like to change it to an underscore method now that you mention it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keyword wasn't simply renamed -- the type of argument that it takes was changed too. Typically that would be an unfriendly thing to do, but this is essentially a private method. No other part of the sage library uses it (it's referenced exactly once, by self), and since it's part of sage's custom doctest runner, it's hard to imagine a third party relying on these internal details.

Cool. Sage policy is still to deprecate it. So deprecate the old argument and add a new one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no way to translate the old argument into the new one. If we keep the old name but ignore its argument, then any code using this function will still be immediately broken, only more quietly and with more steps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you feel that strongly about not changing this method's signature, I can copy & paste the entire function into a new private method _report_overtime. And then we should do the same for _report_success. That will leave the old methods intact to be deprecated, but I really think we would be wasting our time.

It's going to take time to put the old functions back and to add the deprecation warnings. It will take time to re-review those changes. It's going to take time to run the doctests for two unused functions in the interim. And then eventually someone has to spend the time to remove the functions, and someone else has to spend the time to review that.

Unless there's an (even remotely) plausible reason why anyone would be calling this function in their own code, it's justifiable to bend the deprecation rule. Its intent has been respected. And to do otherwise is wasting our most valuable resource.

@jhpalmieri
Copy link
Member

Not your job here, but would it be worthwhile to have a summary of overly long doctests printed at the end of doctesting?

@orlitzky
Copy link
Contributor Author

Not your job here, but would it be worthwhile to have a summary of overly long doctests printed at the end of doctesting?

Yes, but my vote would be to collect all warnings into that report.

@github-actions
Copy link

github-actions bot commented Sep 29, 2023

Documentation preview for this PR (built with commit efa14c3; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

@orlitzky
Copy link
Contributor Author

@mkoeppe I have two modularization failures (in repl & categories) that I'm sure you know how to fix. Both fail at,

[sagemath_categories-10.4.beta3] [spkg-check]   File "/sage/pkgs/sagemath-categories/.tox/sagepython-sagewheels-nopypi-norequirements/lib/python3.10/site-packages/sage/doctest/util.py", line 221, in _quick_cputime
  [sagemath_categories-10.4.beta3] [spkg-check]     from sage.interfaces.quit import expect_objects
  [sagemath_categories-10.4.beta3] [spkg-check] ModuleNotFoundError: No module named 'sage.interfaces.quit'

@orlitzky
Copy link
Contributor Author

The CI has the following test failing on Conda/macOS:

sage: from sage.doctest.util import Timer                                                                                  
sage: cputime = 0.0                                                                                                        
sage: try:                                                                                                                 
....:     cputime = Timer()._pid_cpu_seconds(1)                                                                            
....: except OSError:                                                                                                      
....:     pass                                                                                                             
sage: cputime >= 0.0                                                                                                       
True                                                                                                                       
sage: isinstance(cputime, float)                                                                                           
True

The failure is,

Failed example:
    isinstance(cputime, float)
Expected:
    True
Got:
    False

The function either raises an OSError, or should return a float divided by an integer, which I'm expecting to be a float itself. I could just cast it with return float(...), but if anyone with macOS can reproduce it, I'd rather know why I'm casting it.

@tornaria
Copy link
Contributor

The CI has the following test failing on Conda/macOS:

sage: from sage.doctest.util import Timer                                                                                  
sage: cputime = 0.0                                                                                                        
sage: try:                                                                                                                 
....:     cputime = Timer()._pid_cpu_seconds(1)                                                                            
....: except OSError:                                                                                                      
....:     pass                                                                                                             
sage: cputime >= 0.0                                                                                                       
True                                                                                                                       
sage: isinstance(cputime, float)                                                                                           
True

The failure is,

Failed example:
    isinstance(cputime, float)
Expected:
    True
Got:
    False

The function either raises an OSError, or should return a float divided by an integer, which I'm expecting to be a float itself. I could just cast it with return float(...), but if anyone with macOS can reproduce it, I'd rather know why I'm casting it.

Well, sure:

sage: cputime = 0.0
sage: isinstance(cputime, float)
False

You forgot (a) the preparser makes 0.0 into a RealLiteral and (b) a RealLiteral is not an instance of a float.

@orlitzky
Copy link
Contributor Author

You forgot (a) the preparser makes 0.0 into a RealLiteral and (b) a RealLiteral is not an instance of a float.

FFS. Thanks.

@orlitzky
Copy link
Contributor Author

Still the two modularization failures. Judging from the line,

include sage/misc/timing.p*             # walltime, cputime used in sage.doctest

in sagemath-objects/MANIFEST.in, we probably need to include sage.interfaces.quit there as well (because now sage.doctest uses it to obtain the list of running "expect" interfaces).

I dare not make this change myself. @mkoeppe y/n?

@orlitzky
Copy link
Contributor Author

Rebased again. I'll try to make that modularization fix myself I guess.

If anyone is tired of repeatedly having to split up the test runs in CI only to still get false positives due to timeouts, this is an important step towards addressing the root cause of those failures.

Like ZombieProcess... this can happen. And the doctest framework
shouldn't crash if a pexpect process was killed.
We're importing psutil and trying to catch the ImportError if one is
raised. But for that to work, it helps to put "import" inside of the
"try" block.
If SystemExit occurs during a doctest, we re-raise it. The same thing
should be done for SignalError (from cysignals). Signals can occur at
any time, and the doctest runner shouldn't eat them.

In particular, this can lead to a Heisenbug where a signal will
terminate a doctest example before the timer has a chance to annotate
the it with the elapsed walltime and cputime. In that case, we get
an "impossible" AttributeError later on when we try to compute the
total walltime used by the example.
@orlitzky
Copy link
Contributor Author

CI is finally green 💚 🍏 📗 🟢

Copy link
Contributor

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests report slow docstests although those are marked as "long time" (see annotations at the bottom in https://github.com/sagemath/sage/actions/runs/11373830486?pr=36226). Is this by design because those tests take longer than 30s? If yes, what is the plan with these "super long" tests?

Other than this point, the PR looks good to me.

# being rather generous here.
self.options.warn_long = 5.0
if self.options.long:
self.options.warn_long = 30.0

def second_on_modern_computer(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can now be deleted, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's a "public" function, so I didn't want to start an argument over the deprecation policy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is safe to consider everything in doctest as private. But as a compromise, at least add a deprecation warning so that this can be removed in 1year?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@orlitzky
Copy link
Contributor Author

The tests report slow docstests although those are marked as "long time" (see annotations at the bottom in https://github.com/sagemath/sage/actions/runs/11373830486?pr=36226). Is this by design because those tests take longer than 30s? If yes, what is the plan with these "super long" tests?

Right. Tests that are marked # long time are only run with sage -t --long, but that's rather independent of the warnings. Currently the doctest runner picks some "random," too-large number to set as the default --warn-long value. For example,

$ sage -t src/sage/env.py
...
sage -t --warn-long 77.9 --random-seed=43834777074638100050904209660685805841 src/sage/env.py

That number doesn't depend on --long, and even for # long time tests, it is much too large. Our guidelines say that a normal test should complete in under 1s, and a "long" test in 5s. As it stands, we never get warnings about non-long tests that run too long, because the --warn-long value is high enough to accommodate even badly-behaving # long time tests. We also get few warnings about the # long time tests, because again, the number is just too high.

I've changed this in two ways:

  1. The default --warn-long number depends on whether or not --long was used. This lets us detect slow non-long tests when --long is not used.
  2. Those --warn-long numbers are deterministic, and low enough to be useful.

While 30s is still "super long," I'm trying to avoid turning the test output into an unreadable mess of warnings. We have a lot of tests that are too slow because the current behavior doesn't catch them. My plan was to start at something no one could object to (30s) and then lower it slowly as the existing issues get fixed.

@tobiasdiez
Copy link
Contributor

Okay, this makes a lot of sense.

However, on CI you now have quite a bit of warnings at the end - which are usually unrelated to the PR. While this might stimulate people into helping making these tests quicker, I would prefer if the annotations are actually relevant to a given PR.

Can you make the warn-long times even higher on CI (check for the CI env variable should do)? Or otherwise hide those warnings for now?

This method of the DocTestController class is no longer used in the
sage library, but technically, it was "public," so we deprecate it for
later removal.
@orlitzky
Copy link
Contributor Author

Okay, this makes a lot of sense.

However, on CI you now have quite a bit of warnings at the end - which are usually unrelated to the PR. While this might stimulate people into helping making these tests quicker, I would prefer if the annotations are actually relevant to a given PR.

Can you make the warn-long times even higher on CI (check for the CI env variable should do)? Or otherwise hide those warnings for now?

Yeah I didn't notice those annotations. I guess github turns the word "Warning" into an error, and doesn't give you a way to disable it, selectively or otherwise. I would like to keep the "slow doctest" warnings from the CI, just not the fake errors. I pushed a commit that will change the "Warning, slow doctest" into "Uh oh, slow doctest" when running under the CI. If it works, we can still ctrl-f the logs for "slow doctest," but they won't get converted to error annotations.

@tobiasdiez
Copy link
Contributor

These annotations are generated here:

elif format == 'github':
# https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#using-workflow-commands-to-access-toolkit-functions
command = f'::error title={fail_msg}'
command += f',file={source.printpath}'
if output:
if m := re.search("## line ([0-9]+) ##\n-{40,100}\n(.*)", output, re.MULTILINE | re.DOTALL):
lineno = m.group(1)
message = m.group(2)
command += f',line={lineno}'
else:
message = output
# Urlencoding trick for multi-line annotations
# https://github.com/actions/starter-workflows/issues/68#issuecomment-581479448
message = message.replace('\n', '%0A')
else:
message = ""
command += f'::{message}'
log(command)

It might be enough to use notice instead of error as the reporting mechanism for too-long-runs. Then they still would show up as annotations but not in red (no idea how exactly they are displayed). Alternatively, don't add the file + line info for too-long-warnings in which case the would only show up in the github action summary page but not in the review/code display. I would be fine with both solutions.

@orlitzky
Copy link
Contributor Author

These annotations are generated here:

I looked at this but I didn't see a connection to the slow doctest warnings. The error annotations are added in _log_failure(), and shouldn't affect anything unless the test fails? I wanted to change the "Warning" label as an experiment, but not even that has worked. For whatever reason, os.environ.get("CI") is not returning "true" under the CI.

@tobiasdiez
Copy link
Contributor

These annotations are generated here:

I looked at this but I didn't see a connection to the slow doctest warnings. The error annotations are added in _log_failure(), and shouldn't affect anything unless the test fails?

No idea...but I assumed it would be generated there, since these annotations are "errors" (red) and not "warnings" (yellow). So if github would just magically parse "Warnings" then they would most likely be shown in yellow.

I wanted to change the "Warning" label as an experiment, but not even that has worked. For whatever reason, os.environ.get("CI") is not returning "true" under the CI.

Probably because the ci uses its own docker container and never sets CI as would be normally the case with github. Try to check for

ENV GITHUB_ACTIONS=1

@orlitzky
Copy link
Contributor Author

Ah, thanks. I see invocations like,

2024-10-20T13:04:25.5703322Z ##[command]/usr/bin/docker create --name b3360c32c4bf46369bb60408efada007_registry2_ae6c9c --label 00c75c --network github_network_152498b2c6d54a50922b8e3fb958afe5 --network-alias registry -p 5000:5000  -e GITHUB_ACTIONS=true -e CI=true registry:2

So maybe we just need to pass $CI through to the dockerfile as well. We'll know in an hour...

@orlitzky
Copy link
Contributor Author

Back to square one:

  • Github changes the warnings to error annotations, even if I change the word "warning" to "uh oh"
  • We are not in _log_failure, so the custom "github" report format is not playing a part (you can confirm with raise SystemExit or whatever)

I think I've done enough blind debugging of Microsoft's crap today.

Switch "Warning, slow doctest:" to "Warning: slow doctest:" so that
the leading "Warning: " is consistent with other warnings. This
ultimately prevents these slow doctest warnings from being converted
into error annotations on the Github CI.
@orlitzky
Copy link
Contributor Author

I finally got it to use warning annotations (and not errors) by switching the Warning, to Warning:

@orlitzky
Copy link
Contributor Author

@tobiasdiez is it OK with the warning annotations, or do you want me to suppress them further?

Copy link
Contributor

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if they wouldn't show up at all, but it's fine like this for me.

Feel free to set it to positive review, or wait a little bit if you still want a second pair of eyes to look at the code.

@orlitzky
Copy link
Contributor Author

Ok, thank you. I started working on this in December of 2021 so I'd quite like to take it off my list and start working on the follow-up issues.

@vbraun vbraun merged commit 7e4f7bd into sagemath:develop Oct 26, 2024
18 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants