Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc_platform_intrinsics takes LLVM lots of time to compile #28273

Closed
arielb1 opened this issue Sep 6, 2015 · 7 comments
Closed

rustc_platform_intrinsics takes LLVM lots of time to compile #28273

arielb1 opened this issue Sep 6, 2015 · 7 comments
Labels
I-compiletime Issue: Problems and improvements with respect to compile times.

Comments

@arielb1
Copy link
Contributor

arielb1 commented Sep 6, 2015

CPU:

model name  : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
stepping    : 3
microcode   : 0x12
cpu MHz     : 3392.293
cache size  : 8192 KB

with debuginfo:

time: 0.580; rss: 228MB translation
  time: 0.228; rss: 172MB       llvm function passes
  time: 48.802; rss: 287MB      llvm module passes
  time: 17.807; rss: 454MB      codegen passes
  time: 0.001; rss: 454MB       codegen passes
time: 67.451; rss: 454MB        LLVM passes

without debuginfo:

time: 0.467; rss: 221MB translation
  time: 0.179; rss: 167MB       llvm function passes
  time: 30.995; rss: 222MB      llvm module passes
  time: 10.347; rss: 342MB      codegen passes
  time: 0.000; rss: 342MB       codegen passes
time: 41.762; rss: 342MB        LLVM passes

We should either make this fast or write it to not trigger this worst-case.

cc @huonw
cc @dotdash

there was also a case of 600s compile-time on the same CPU, but I can't reproduce it.

@arielb1 arielb1 added the I-compiletime Issue: Problems and improvements with respect to compile times. label Sep 6, 2015
@dotdash
Copy link
Contributor

dotdash commented Sep 7, 2015

We have a winner here:

  161.8920 ( 66.8%)   0.0340 (  4.9%)  161.9260 ( 66.6%)  161.6847 ( 66.6%)  Function Integration/Inlining
   8.2570 (  3.4%)   0.0270 (  3.9%)   8.2840 (  3.4%)   8.2553 (  3.4%)  Induction Variable Simplification
   6.8030 (  2.8%)   0.0100 (  1.4%)   6.8130 (  2.8%)   6.7934 (  2.8%)  Induction Variable Users

That's pretty impressive...

@arielb1
Copy link
Contributor Author

arielb1 commented Sep 7, 2015

what command did you use?

@dotdash
Copy link
Contributor

dotdash commented Sep 7, 2015

That's with -Ztime-llvm-passes

@dotdash
Copy link
Contributor

dotdash commented Sep 7, 2015

Findings so far: This is caused by LLVM doing cost analysis on the caller if the caller has local linkage.

@dotdash
Copy link
Contributor

dotdash commented Sep 8, 2015

Marking the three find() functions as #[inline(never)] stops us from triggering a pathological case in the inliner and reduces compile times by almost 70%.

dotdash added a commit to dotdash/rust that referenced this issue Sep 11, 2015
When the inliner has to decided if it wants to inline a function A into an
internal function B, it first checks whether it would be more profitable
to inline B into its callees instead. This means that it has to analyze
B, which involves checking the assumption cache. Building the assumption
cache requires scanning the whole function, and because inlining
currently clears the assumption cache, this scan happens again and
again, getting even slower as the function grows from inlining.

As inlining the huge find functions isn't really useful anyway, we can
mark them as noinline, which skips the cost analysis and reduces compile
times by as much as 70%.

cc rust-lang#28273
bors added a commit that referenced this issue Sep 11, 2015
When the inliner has to decided if it wants to inline a function A into an
internal function B, it first checks whether it would be more profitable
to inline B into its callees instead. This means that it has to analyze
B, which involves checking the assumption cache. Building the assumption
cache requires scanning the whole function, and because inlining
currently clears the assumption cache, this scan happens again and
again, getting even slower as the function grows from inlining.

As inlining the huge find functions isn't really useful anyway, we can
mark them as noinline, which skips the cost analysis and reduces compile
times by as much as 70%.

cc #28273
ranma42 added a commit to ranma42/rust that referenced this issue Sep 12, 2015
Commit 9104a90 fixed the generated
files, but that change would be lost (or require additional manual
intervention) if they are re-generated of if new architectures are
added.

cc rust-lang#28273
bors added a commit that referenced this issue Sep 13, 2015
Commit 9104a90 fixed the generated
files, but that change would be lost (or require additional manual
intervention) if they are re-generated of if new architectures are
added.

cc #28273
@arielb1
Copy link
Contributor Author

arielb1 commented Jan 31, 2016

This seems to have grown out of control in debuginfo builds, with this sample:

#0  0x00002b853d8e37f5 in (anonymous namespace)::LiveDebugValues::VarLoc::operator==((anonymous namespace)::LiveDebugValues::VarLoc const&) const ()
   from /tmp/tmp.Vfq0tEURq9/rust/build-debug-assertions/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_llvm-db5a760f.so
#1  0x00002b853d8e644c in (anonymous namespace)::LiveDebugValues::ExtendRanges(llvm::MachineFunction&) ()
   from /tmp/tmp.Vfq0tEURq9/rust/build-debug-assertions/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_llvm-db5a760f.so
#2  0x00002b853e0bcd7a in llvm::FPPassManager::runOnFunction(llvm::Function&) ()
   from /tmp/tmp.Vfq0tEURq9/rust/build-debug-assertions/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_llvm-db5a760f.so
#3  0x00002b853e0bd0db in llvm::FPPassManager::runOnModule(llvm::Module&) ()
   from /tmp/tmp.Vfq0tEURq9/rust/build-debug-assertions/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_llvm-db5a760f.so
#4  0x00002b853e0bd3d0 in llvm::legacy::PassManagerImpl::run(llvm::Module&) ()
   from /tmp/tmp.Vfq0tEURq9/rust/build-debug-assertions/x86_64-unknown-linux-gnu/stage1/bin/../lib/librustc_llvm-db5a760f.so
#5  0x00002b853cfe8861 in LLVMRustWriteOutputFile (Target=0x2b85441d3550, 
    PMR=0x2b855d18f280, M=0x2b85472daab0, path=<optimised out>, 
    FileType=llvm::TargetMachine::CGFT_ObjectFile)

@aturon
Copy link
Member

aturon commented Feb 26, 2016

cc @michaelwoerister

alexcrichton added a commit to alexcrichton/rust that referenced this issue Mar 14, 2016
This commit improves the compile time of `rustc_platform_intrinsics` from 23s to
3.6s if compiling with `-O` and from 77s to 17s if compiling with `-O -g`. The
compiled rlib size also drops from 3.1M to 1.2M.

The wins here were gained by removing the destructors associated with `Type` by
removing the internal `Box` and `Vec` indirections. These destructors meant that
a lot of landing pads and extra code were generated to manage the runtime
representations. Instead everything can basically be statically computed and
shoved into rodata, so all we need is a giant string compare to lookup what's
what.

Closes rust-lang#28273
bors added a commit that referenced this issue Mar 15, 2016
rustc: Improve compile time of platform intrinsics

This commit improves the compile time of `rustc_platform_intrinsics` from 23s to
3.6s if compiling with `-O` and from 77s to 17s if compiling with `-O -g`. The
compiled rlib size also drops from 3.1M to 1.2M.

The wins here were gained by removing the destructors associated with `Type` by
removing the internal `Box` and `Vec` indirections. These destructors meant that
a lot of landing pads and extra code were generated to manage the runtime
representations. Instead everything can basically be statically computed and
shoved into rodata, so all we need is a giant string compare to lookup what's
what.

Closes #28273
alexcrichton added a commit to alexcrichton/rust that referenced this issue Mar 16, 2016
This commit improves the compile time of `rustc_platform_intrinsics` from 23s to
3.6s if compiling with `-O` and from 77s to 17s if compiling with `-O -g`. The
compiled rlib size also drops from 3.1M to 1.2M.

The wins here were gained by removing the destructors associated with `Type` by
removing the internal `Box` and `Vec` indirections. These destructors meant that
a lot of landing pads and extra code were generated to manage the runtime
representations. Instead everything can basically be statically computed and
shoved into rodata, so all we need is a giant string compare to lookup what's
what.

Closes rust-lang#28273
bors added a commit that referenced this issue Mar 16, 2016
rustc: Improve compile time of platform intrinsics

This commit improves the compile time of `rustc_platform_intrinsics` from 23s to
3.6s if compiling with `-O` and from 77s to 17s if compiling with `-O -g`. The
compiled rlib size also drops from 3.1M to 1.2M.

The wins here were gained by removing the destructors associated with `Type` by
removing the internal `Box` and `Vec` indirections. These destructors meant that
a lot of landing pads and extra code were generated to manage the runtime
representations. Instead everything can basically be statically computed and
shoved into rodata, so all we need is a giant string compare to lookup what's
what.

Closes #28273
bors added a commit that referenced this issue Jul 18, 2018
Enable default inlining in platform intrinsics

Since [#28273](#28273) has been fixed for quite some time, it might be a good idea to return to default inlining in platform intrinsics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-compiletime Issue: Problems and improvements with respect to compile times.
Projects
None yet
Development

No branches or pull requests

3 participants