-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compilation is 350x times slower in release for r68k #31381
Comments
|
I recall that the assumption cache (tracker?) made some problems before... Where are we using assumptions for, exactly? cc @dotdash, I guess :) |
This looks like it's hitting the same problem as #28273, but applying a workaround for that still results in a pretty slow build, so there's probably more to this. |
Seems that we lost some re-use for landing pads. The cleanup code grows quadratically with the number of calls that may unwind: clean_custom_: ; preds = %unwind_custom_
call void @_ZN33cpu..ops..handlers..OpcodeHandler10drop.2140517h03d4f9d015a167dbE(%"cpu::ops::handlers::OpcodeHandler"* %60)
%82 = bitcast %"cpu::ops::handlers::OpcodeHandler"* %60 to i8*
call void @llvm.memset.p0i8.i64(i8* %82, i8 29, i64 40, i32 8, i1 false)
br label %clean_ast_6925_
; ...
clean_custom_12: ; preds = %unwind_custom_11
call void @_ZN33cpu..ops..handlers..OpcodeHandler10drop.2140517h03d4f9d015a167dbE(%"cpu::ops::handlers::OpcodeHandler"* %67)
%91 = bitcast %"cpu::ops::handlers::OpcodeHandler"* %67 to i8*
call void @llvm.memset.p0i8.i64(i8* %91, i8 29, i64 40, i32 8, i1 false)
call void @_ZN33cpu..ops..handlers..OpcodeHandler10drop.2140517h03d4f9d015a167dbE(%"cpu::ops::handlers::OpcodeHandler"* %60)
%92 = bitcast %"cpu::ops::handlers::OpcodeHandler"* %60 to i8*
call void @llvm.memset.p0i8.i64(i8* %92, i8 29, i64 40, i32 8, i1 false)
br label %clean_ast_6925_ The last landing pad block has 514 drop calls, so that makes 132355 drop calls in total. No wonder this takes some time. (With a mitigation for the inlining assumption cache problem, I'm down to ~15minutes on my box though) |
Thanks for looking into this :) |
@alexcrichton Are you sure about that? This happens on stable also. |
Aha, in that case it may not be a regression but a pre-existing bug, never mind! |
This is present since 2014 AFAICT. I'll likely have a patch ready this evening. |
If a new cleanup is added to a cleanup scope, the cached exits for that scope are cleared, so all previous cleanups have to be translated again. In the worst case this means that we get N distinct landing pads where the last one has N cleanups, then N-1 and so on. As new cleanups are to be executed before older ones, we can instead cache the number of already translated cleanups in addition to the block that contains them, and then only translate new ones, if any and then jump to the cached ones, getting away with linear growth instead. For the crate in rust-lang#31381 this reduces the compile time for an optimized build from >20 minutes (I cancelled the build at that point) to about 11 seconds. Testing a few crates that come with rustc show compile time improvements somewhere between 1 and 8%. The "big" winner being rustc_platform_intrinsics which features code similar to that in rust-lang#31381. Fixes rust-lang#31381
If a new cleanup is added to a cleanup scope, the cached exits for that scope are cleared, so all previous cleanups have to be translated again. In the worst case this means that we get N distinct landing pads where the last one has N cleanups, then N-1 and so on. As new cleanups are to be executed before older ones, we can instead cache the number of already translated cleanups in addition to the block that contains them, and then only translate new ones, if any and then jump to the cached ones, getting away with linear growth instead. For the crate in rust-lang#31381 this reduces the compile time for an optimized build from >20 minutes (I cancelled the build at that point) to about 11 seconds. Testing a few crates that come with rustc show compile time improvements somewhere between 1 and 8%. The "big" winner being rustc_platform_intrinsics which features code similar to that in rust-lang#31381. Fixes rust-lang#31381
If a new cleanup is added to a cleanup scope, the cached exits for that scope are cleared, so all previous cleanups have to be translated again. In the worst case this means that we get N distinct landing pads where the last one has N cleanups, then N-1 and so on. As new cleanups are to be executed before older ones, we can instead cache the number of already translated cleanups in addition to the block that contains them, and then only translate new ones, if any and then jump to the cached ones, getting away with linear growth instead. For the crate in #31381 this reduces the compile time for an optimized build from >20 minutes (I cancelled the build at that point) to about 11 seconds. Testing a few crates that come with rustc show compile time improvements somewhere between 1 and 8%. The "big" winner being rustc_platform_intrinsics which features code similar to that in #31381. Fixes #31381
Hi,
When building this project https://github.com/marhel/r68k with
cargo build --release
it takes 350x times longer time than in debug.I tested this on both stable and nightly (on Mac) and the result seems to be the same.
Here is the original issue: marhel/r68k#60
I have also attached a performance capture (Done with XCode/Instruments on Mac) which i took for a while (not the whole duration of the compile but should hopefully give a hint on where most of the time is being spent)
The text was updated successfully, but these errors were encountered: