-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track mypy performance changes automatically #14187
Comments
I've started working on this. Some details:
|
Related hauntsaninja/mypy_primer#30 |
Additional context: The mypyc benchmarks are run on a dedicated host that has been tweaked for precise performance measurements (e.g. no turbo boost, automatic updates disabled). I hope that we will able to detect performance regressions below 0.5% (stretch goal: detect a 0.1% regression). |
Based on initial results, detecting regressions that are under 0.5% seems hard. A 1.0% regression is probably clearly visible, and a 0.5% regression might be visible. Perhaps we can improve precision by tweaking the measurement logic (e.g. by dropping outliers), but it's not clear yet. I'll perform a few additional experiments. |
@JukkaL I think it would be helpful to type check repositories other than mypy. Here are the 20 slowest repositories to type check from mypy_primer (which also contains the exact commands and deps used).
If I had to pick five from these, I'd maybe do:
|
I notice that wall time is used. Maybe CPU instruction count would have less noise? I know it's what stuff like Rust's perf site uses by default (https://perf.rust-lang.org). See also stuff like rust-lang/rust#104646 (comment) I'm not sure it would be as useful given that as I understand it some amount of mypy time is spent in IO. |
Mypy does some IO and it would be good to track regressions related to IO as well, so this doesn't sound ideal for mypy.
Agreed, this would be nice! I'm focusing on mypy at first, but we could add some extra repositories once the whole thing works reliably for mypy. |
I tried a few different things to get more precise results, but I'm still stuck at about 1% noise floor. It's still enough to detect major performance regressions. I'm now collecting historical data, over the most recent 1000 mypy commits or so. It will take about two weeks of calendar time to complete (if everything goes smoothly). I will also experiment with collecting performance data about interpreted mypy runs, in the hope of getting more accurate timings. |
I now have collected performance data for about one year of commits, though the most recent commits are still missing. Here's the data: https://github.com/mypyc/mypyc-benchmark-results/blob/master/reports/benchmarks/mypy_self_check.md Performance has regressed by about 50% over a year, so it's time to focus on fixing the regressions! I hope that in the future we can keep the overall level of slowdown under 10% in any year (and preferably under 5%) so that perceived mypy performance stays at least roughly the same over time, when accounting for hardware improvements. Hardware has improved about 5-10% per year for single-threaded workloads in the last 10 years, on average. The noise floor is about 1.5%. Perhaps with some smoothing we can bring it closer to 1.0%. Anyway, it's already good enough to find major regressions and improvements. I started looking at reducing the impact of 48c4a47 some time ago. This was a previously known regression that got me interested in finding other regressions like that. |
Here are some optimizations based on the performance data: #14316 |
These changes are very nice! Would it be possible to get a preview of whether a PR introduces a performance regression before merging it (a la mypy_primer)? I'm not sure if that's planned or not, but IMO would be useful. |
I agree that this would be useful. I don't have immediate plans to implement this, since this would likely require additional dedicated runner host to avoid noisy results on shared infra. |
I was looking at some other topics and found scala/scala-dev#338 Presumably most viable things listed there have already been tried (given e.g. turbo boost has already been disabled), but it's incredibly comprehensive and it would be great to catch even smaller regressions! |
I'm closing this now, as we have basic tracking of performance changes. I'll create another issue to track further improvements. In particular, it would be great to get the noise floor down to below 0.5%. |
Currently it's difficult to find the root cause of a performance regression. We could automatically profile each mypy commit and generate a report with historical performance information, similar to how we track the performance of mypyc (e.g. https://github.com/mypyc/mypyc-benchmark-results/blob/master/reports/summary-main.md).
Add another benchmark to the mypyc benchmarks runner that type checks a fixed version of mypy using the current compiled mypy.
The benchmark will live in https://github.com/mypyc/mypyc-benchmarks.
The text was updated successfully, but these errors were encountered: