-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc_platform_intrinsics takes LLVM lots of time to compile #28273
Comments
We have a winner here:
That's pretty impressive... |
what command did you use? |
That's with |
Findings so far: This is caused by LLVM doing cost analysis on the caller if the caller has local linkage. |
Marking the three |
When the inliner has to decided if it wants to inline a function A into an internal function B, it first checks whether it would be more profitable to inline B into its callees instead. This means that it has to analyze B, which involves checking the assumption cache. Building the assumption cache requires scanning the whole function, and because inlining currently clears the assumption cache, this scan happens again and again, getting even slower as the function grows from inlining. As inlining the huge find functions isn't really useful anyway, we can mark them as noinline, which skips the cost analysis and reduces compile times by as much as 70%. cc rust-lang#28273
When the inliner has to decided if it wants to inline a function A into an internal function B, it first checks whether it would be more profitable to inline B into its callees instead. This means that it has to analyze B, which involves checking the assumption cache. Building the assumption cache requires scanning the whole function, and because inlining currently clears the assumption cache, this scan happens again and again, getting even slower as the function grows from inlining. As inlining the huge find functions isn't really useful anyway, we can mark them as noinline, which skips the cost analysis and reduces compile times by as much as 70%. cc #28273
Commit 9104a90 fixed the generated files, but that change would be lost (or require additional manual intervention) if they are re-generated of if new architectures are added. cc rust-lang#28273
This seems to have grown out of control in debuginfo builds, with this sample:
|
This commit improves the compile time of `rustc_platform_intrinsics` from 23s to 3.6s if compiling with `-O` and from 77s to 17s if compiling with `-O -g`. The compiled rlib size also drops from 3.1M to 1.2M. The wins here were gained by removing the destructors associated with `Type` by removing the internal `Box` and `Vec` indirections. These destructors meant that a lot of landing pads and extra code were generated to manage the runtime representations. Instead everything can basically be statically computed and shoved into rodata, so all we need is a giant string compare to lookup what's what. Closes rust-lang#28273
rustc: Improve compile time of platform intrinsics This commit improves the compile time of `rustc_platform_intrinsics` from 23s to 3.6s if compiling with `-O` and from 77s to 17s if compiling with `-O -g`. The compiled rlib size also drops from 3.1M to 1.2M. The wins here were gained by removing the destructors associated with `Type` by removing the internal `Box` and `Vec` indirections. These destructors meant that a lot of landing pads and extra code were generated to manage the runtime representations. Instead everything can basically be statically computed and shoved into rodata, so all we need is a giant string compare to lookup what's what. Closes #28273
This commit improves the compile time of `rustc_platform_intrinsics` from 23s to 3.6s if compiling with `-O` and from 77s to 17s if compiling with `-O -g`. The compiled rlib size also drops from 3.1M to 1.2M. The wins here were gained by removing the destructors associated with `Type` by removing the internal `Box` and `Vec` indirections. These destructors meant that a lot of landing pads and extra code were generated to manage the runtime representations. Instead everything can basically be statically computed and shoved into rodata, so all we need is a giant string compare to lookup what's what. Closes rust-lang#28273
rustc: Improve compile time of platform intrinsics This commit improves the compile time of `rustc_platform_intrinsics` from 23s to 3.6s if compiling with `-O` and from 77s to 17s if compiling with `-O -g`. The compiled rlib size also drops from 3.1M to 1.2M. The wins here were gained by removing the destructors associated with `Type` by removing the internal `Box` and `Vec` indirections. These destructors meant that a lot of landing pads and extra code were generated to manage the runtime representations. Instead everything can basically be statically computed and shoved into rodata, so all we need is a giant string compare to lookup what's what. Closes #28273
CPU:
with debuginfo:
without debuginfo:
We should either make this fast or write it to not trigger this worst-case.
cc @huonw
cc @dotdash
there was also a case of 600s compile-time on the same CPU, but I can't reproduce it.
The text was updated successfully, but these errors were encountered: