-
-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track performance regressions in CI #25262
Comments
comment:1
I think we have to work with the Advanced API (https://docs.python.org/2/library/doctest.html#advanced-api) and hook into |
comment:2
This is great, and I'm happy to help! We're already using the advanced API. See |
comment:3
Just to say something which I have always said before: measuring timings is the easy part. The hard part is doing something useful with those timings. |
comment:4
Duplicate of #12720. |
comment:5
I don't think this is a duplicate. This is about integrating speed regression checks into CI (GitLab CI, CircleCI.) Please reopen. |
comment:7
Replying to @jdemeyer:
That's what airspeed velocity is good for. |
comment:8
Great! It's an excellent tool and I've wanted to see it used for Sage for a long time, but wasn't sure where to begin. In case it helps I know and have worked with its creator personally. |
comment:10
Replying to @saraedum:
Well, I'd love to be proven wrong. I thought it was just a tool to benchmark a given set of commands across versions and display fancy graphs. |
comment:12
Not just across versions but across commits, even (though I think you can change the granularity). Here are Astropy's ASV benchmarks: http://www.astropy.org/astropy-benchmarks/ There are numerous benchmark tests for various common and/or time-critical operations. For example, we can track how coordinate transformations perform over time (which is one example of complex code that can fairly easily be thrown into bad performance by just a few small changes somewhere). |
comment:14
update milestone 8.3 -> 8.4 |
Author: Julian Rüth |
comment:15
Adding this to all doctests is probably hard and would require too much hacking on asv. It's probably best to use the tool as it was intended to be used. Once #24655 is in, I would like to setup a prototype within Sage. Any area that you would like to have benchmarked from the start? |
This comment has been minimized.
This comment has been minimized.
comment:16
Replying to @saraedum:
This is the "hard part" that I mentioned in [comment:3]. Ideally, we shouldn't have to guess where regressions might occur, the tool would do that for us. I believe that the intention of #12720 was to integrate this in the doctest framework such that all(?) doctests would also be regression tests. But that's probably not feasible, so here is a more productive answer:
|
Replying to @saraedum:
Adding a new method for each regression tests sounds quite heavy. Could it be possible to integrate this in doctests instead? I would love to do
|
comment:18
Replying to @saraedum:
I didn't realize you were trying to do that. And yeah, I think benchmarking every test would be overkill and would produce too much noise to be useful. Better to write specific benchmark tests, and also add new ones as regression tests whenever some major performance regression is noticed. |
comment:45
Replying to @jdemeyer:
I see. I think it would be easy to track lines that say, e.g., Let me try to start with the benchmarking of blocks that say |
comment:46
Just two cents without having though two much about it. I like the I'd rather have a different annotation than Of course, at this stage using Thanks! |
comment:47
Replying to @jdemeyer:
Yes, something like that could be done. Again, it all comes down to providing a different benchmark discovery plugin for ASV. For discovering benchmarks in our doctest, all lines leading up to a Multiple It might be trickier to do this in such a way that avoids duplication but I'll think about that. I think it could still be done. |
comment:48
I think that this is wonderful. Since I tried to improve performance of certain things recently, and will likely continue to do so, I would like to add doctests for speed regression already now. Should I use |
comment:49
Thanks for the feedback. Replying to @mantepse:
Nothing has been decided upon yet. I could imagine something like |
comment:50
Presumably time benchmarking is more usual than memory benchmarking, so I would tend to For memory usage, do you foresee using fine grained tools that instrument the code and actually slow down the execution? Otherwise, could "benchmark" just do both always? |
comment:51
I would actually like Of course, I agree time benchmarks are going to be the most common, so we could still have |
comment:52
Once we're past this deliverable due date I'll spend some more time poking at ASV to get the features we would need in it to make it easier to extend how benchmark collection is performed, and also to integrate it more directly into our existing test runner. |
comment:53
Replying to @embray:
I very much like this (well informed!) proposal. |
comment:54
What is the status of this ticket? There is a branch attached. So, is it really new? Are people working on it? For the record, I too think that having |
comment:55
Right now we need to get the GitLab CI pipeline going again. I need to about getting some more build runners up and running; it's been on my task list for ages. That, or if we can get more time from GCE (if anyone knows anyone at Google or other cloud computing providers who can help getting CPU time donated to the project it would be very helpful). |
This comment has been minimized.
This comment has been minimized.
Branch pushed to git repo; I updated commit sha1. New commits:
|
comment:58
Now that the CI seems to be mostly stable (except for the docbuild timing out for I would like to get a minimal version of this working somehow. We should probably not attempt to get the perfect solution in the first run. The outputs this created are actually quite useful already imho. If our contributors actually end up looking at the results, we can add more features (more keywords, more iterations, memory benchmarking, comparisons to other CAS,…) So, my proposal would be to go with this version (modulo cleanup & documentation & CI integration.) If somebody wants to improve/reimplement this in a good way, I am very happy to review that later. I am not sure how much time I will have to work on this so if anybody wants to get more involved, please let me know :) |
Work Issues: documentation, doctests, CI |
Changed keywords from none to ContinuousIntegration |
comment:61
rebased New commits:
|
Changed branch from u/saraedum/25262 to public/airspeed_velo |
comment:62
this needs adaptation to python3, apparently |
comment:63
I am thinking about reviving this issue with a different application in mind that is a bit easier than regression testing. Namely, to have a better understanding how different values for the I find that we rarely update the default algorithms. However, this could be quite beneficial say when we upgrade a dependency such as PARI or FLINT. It would be very nice to easily see how the different algorithms perform after an update and also a way to document the instances that have been used to determine the cutoffs that we are using. Currently, we are using some homegrown solutions for this, e.g., |
comment:64
What is actually the problem with the original goal? |
comment:65
Replying to @mantepse:
There's no fundamental problem. But doing the CI setup is quite a bit of work. |
cc @roed314 @seblabbe @alexjbest @mezzarobba. @roed314 and I started to work on this again at days 117. |
Sorry that I missed the discussion. I'm happy to help too, but will have very little time for that after the end of the Sage Days. |
I am currently playing with airspeed velocity to track speed regressions in Sage. I would like to benchmark every doctest that has a
long time
orbenchmark
marker in it and also benchmark every method that has atime_
prefix (probably only in some benchmark module.)We have something similar set up for https://github.com/MCLF/mclf/tree/master/mclf/benchmarks now. There are only two benchmarks but it works nicely.
I ran the above proposal for all the tags from 8.3.beta0 to 8.3. There's a lot of noise (because there was other activity on the machine) but you get the idea: https://saraedum.github.io/sage/
Another interesting demo of airspeedvelocity that is not related to Sage is here: https://pv.github.io/numpy-bench/#/regressions
Depends on #24655
CC: @roed314 @embray @nthiery @koffie @videlec
Component: doctest framework
Keywords: ContinuousIntegration
Work Issues: documentation, doctests, CI
Author: Julian Rüth
Branch/Commit: public/airspeed_velo @
68869ae
Issue created by migration from https://trac.sagemath.org/ticket/25262
The text was updated successfully, but these errors were encountered: