-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build released compiler artifacts as optimized as possible #49180
Comments
To make sure we set expectations, the 10% number of perf is not 10% slower, it's "executes 10% more instructions". The change in the number of instructions is often an indicator that there could be a regression but it does not translate to a 10% slowdown in literal wall time. For example the wall-time measurements for that commit shows the worst regression, percentage-wise, as 0.49s to 0.56s. Large benchmarks like servo-style-opt got at worse 3.8% slower in a clean build from scratch, going from 75 to 78 seconds. I mean to point this out in terms of reducing the number of codegen units or PGO or those sorts of optimizations aren't really silver bullets. They're incredibly expensive optimizations for a few seconds here and there as opposed to major optimizations across the board. |
@alexcrichton thanks for clarifying. |
@alexcrichton Yes, I know that this won't make the compiler massively faster. On the other hand, it's not uncommon that we spend weeks of developer time on getting a 5% compile time improvement. If there's the opportunity of making the compiler 10% faster by letting a build machine chew on it for a few hours every six weeks, I think we should take it. That being said, I don't underestimate the complexity of our CI. I just don't want us to disregard the opportunity from the beginning. Maybe there is a simpler solution that would get us 90% of the way. |
Moving to opt-level=3 can speed up up to 2%, but it's blocked on a Windows codegen bug. See also: #48204. |
@andjo403's comments on gitter have given me the idea that we could also try to build LLVM with PGO. I realize of course that this would require lots of new infrastructure support and isn't something that can be implemented quickly. |
Some updates here:
|
I opened a separate issue for symbol ordering: #50655 |
|
@alexcrichton & @nnethercote: Thanks to you we have pipelining now and our bootstrap time should be quite a bit shorter, right? (according to this: https://gistpreview.github.io/?74d799739504232991c49607d5ce748a) Can we switch compiler back to |
We're unfortunately way too close to 4 hours and frequently going over I think today to be able to afford going back to codegen-units=1. Pipelining I think doesn't help us too much on CI since we only have 2 cores currently so we're not getting the advantage of -j28 like that graph shows :) |
I am surprised that the simple rustc_codegen_utils takes 18s, while the way more complex rustc_codegen_ssa takes 24s in the timings of @michaelwoerister. |
😱 |
But as there only is 2 cores are we sure that codegen-units=1 is not faster? |
My understanding is that LLVM is faster at optimizing smaller modules (not altogether non-obvious, I think, though certainly interesting). That means that splitting the same IR into more modules can produce faster builds, even with just one core. |
On the other hand we'd skip the entire ThinLTO step... let me give it a try locally. |
I would personally agree with @Mark-Simulacrum that we're extremely strapped for time budget on CI right now, and the longest builders are the Windows release builders. We should be extremely careful about making them slow (aka losing parallelism) and we're also hoping to get 4-core machines at some point which may change the calculus in terms of whether 2 cores + pipelining gives us sufficient parallelism or not. |
My local test for |
@michaelwoerister said this at the start:
From subsequent comments it seems like this point might be getting overlooked? We wouldn't do this for all CI builds, just those generating stable releases. How often are stable releases generated? |
We build stable artifacts approximately once every 6 weeks. While I believe the CI platform we're currently on, Pipelines, does not have strict timeouts, I would rather avoid having to wait for more than the existing 4+ hours for a full stable build. Plus, optimizations in this area are plausibly likely to introduce regressions, right? I guess that might be rare, but I believe it is non-theoretical that changes to codegen units in how we build the compiler have caused bugs in the past; I could be wrong about this claim. |
I grepped for past PRs and I have no idea what's the current state of distribution builds: it seems the last documented change was #45444, which means that What is the current state? |
@nnethercote to add to what @Mark-Simulacrum already mentioned I personally think we also derive a lot of value from stable/beta/nightly releases all being produced exactly the same way. That way we can exclude a class of bugs where stable releases are buggy due to how they're built but beta/nightly don't have the same bugs. (for example this would help prevent a showstopper bug on either beta or stable). There's also enough users of non-stable that producing quite-fast compilers on nightly and such is relatively important. If we try to build a full release every night, however, that's where it gets pretty onerous to make release builds slower. That'd happen at least once a day (multiple times for stable/beta), and that runs the risk of being even slower than we currently are, which is already sort of unbearably slow :( @ishitatsuyuki I believe the current state is that libstd is built with one CGU and all rustc crates are built with 16 CGUs and have ThinLTO enabled for each crate's set of CGUs. |
I agree that we should release what we regularly test. Thanks for pointing that out. |
Here's a possibly interesting thought: PGO speeds up Firefox quite a bit (5-10%). Maybe it would be possible to harness PGO for our LLVM builds? We rebuild LLVM only very infrequently and fall back on a cached version for the rest of the time. We just would need a way to fill the cache with a PGO'ed version of LLVM (which is kind of complicated I guess). Anyway, a starting point would be to do a local test and see if there are actual performance improvements to be had. |
|
I don't know if this is the right venue in which to discuss @michaelwoerister 's recent blog post, but I'd love to provide some feedback on my experiences enabling PGO for Firefox CI and the various lessons we learned along the way. |
@luser I'd love to hear about your experiences with PGO for Firefox CI. I think that would be really valuable! I plan create a tracking issue for using PGO on |
My understanding is that there are two parts to this issue:
@michaelwoerister is that an accurate summary? Do you still want to enable codegen-units=1? We have a lot more builder capacity than in the past, I think it would be feasible to turn it on unconditionally for all |
We also have this newer tracking issue, with more details and all the recent work done for the build config: #103595 |
Perfect, thanks! I'm going to close this issue as outdated and use #103595 for tracking these improvements. |
At the moment the compiler binaries that we release are not as fast and optimized as they could be. As of ff227c4, they are built with multiple codegen units and ThinLTO again, which makes the compiler around 10% slower than when built with a single CGU per crate. We really should be able to do better here, especially for stable releases:
-Ccodegen-units=1
for stable releases.@rust-lang/release @rust-lang/infra, how can we decouple builds of stable releases from the regular CI builds that are timing out so much lately. There should be a way of doing these builds without the severe time limits that we have in regular CI.
The text was updated successfully, but these errors were encountered: