Building with LTO should skip "compilation" #43212

glandium · 2017-07-13T10:52:14Z

See #43211 for some horrorifying timings from doing LTO builds in Firefox.

Please correct me where I'm wrong, but here is my understanding of the situation wrt building with LTO (and more or less confirmed by @alexcrichton and @mbrubeck on irc):

Cargo starts building all dependencies
For each dependency, the rust compiler creates an rlib
The rlib contains compiled code for the crate, as well as metadata about the crate.
When linking the main crate with LTO, the rust compiler uses the metadata from the dependee's rlibs, and compiles based on that and the code in the current crate. As I understand it, at this point, all the code that was compiled and put in those rlibs is not used.

In simplified and C/C++ terms, this is my understanding of what's happening:

Let's say we have a test.c that is built in a libtest library, and linked with foo.c into a foo binary.
The libtest library is generated with:
- gcc -o test.o -c test.c -O3 that's the compiled code part
- gcc -o test.lto.o -c test.c -O3 -flto that's the metadata used for LTO
- gcc-ar cr libtest.a test.lto.o test.o
The code for the main binary is generated with:
- gcc -o foo.lto.o -c foo.c -O3 -flto
- (maybe rust even compiles the code here too? like gcc -o foo.o -c foo.c -O3)
- gcc -flto -o foo foo.lto.o libtest.a

In the above, the fact is, if libtest.a only contained test.lto.o, the foo binary would still compile fine, because the compiled code is not used. Which means we've spent time generating that test.o for nothing.

Now, consider a crate like geckoservo, which, while it contains 3Kloc, you wouldn't expect to require the time it takes to build (it's well above a minute). @mbrubeck suggested that compiling the crate inlines a bunch of stuff. Which is probably what is happening. Except that seems completely irrelevant and wasted time, considering it will have to do it all again when linking the entire project.

FWIW, the -Ztime-passes output with last 1.20 nightly, for geckoservo looks like:

time: 0.011; rss: 32MB  parsing
time: 0.000; rss: 32MB  recursion limit
time: 0.000; rss: 32MB  crate injection
time: 0.000; rss: 32MB  plugin loading
time: 0.000; rss: 32MB  plugin registration
time: 0.243; rss: 134MB expansion
time: 0.000; rss: 134MB maybe building test harness
time: 0.000; rss: 134MB maybe creating a macro crate
time: 0.000; rss: 134MB checking for inline asm in case the target doesn't support it
time: 0.001; rss: 134MB early lint checks
time: 0.000; rss: 134MB AST validation
time: 0.015; rss: 137MB name resolution
time: 0.001; rss: 137MB complete gated feature checking
time: 0.005; rss: 140MB lowering ast -> hir
time: 0.001; rss: 138MB indexing hir
time: 0.000; rss: 138MB attribute checking
time: 0.000; rss: 135MB language item collection
time: 0.001; rss: 135MB lifetime resolution
time: 0.000; rss: 135MB looking for entry point
time: 0.000; rss: 135MB looking for plugin registrar
time: 0.000; rss: 135MB loop checking
time: 0.000; rss: 135MB static item recursion checking
time: 0.016; rss: 136MB compute_incremental_hashes_map
time: 0.000; rss: 136MB load_dep_graph
time: 0.000; rss: 136MB stability index
time: 0.002; rss: 136MB stability checking
time: 0.004; rss: 137MB type collecting
time: 0.000; rss: 137MB impl wf inference
time: 0.000; rss: 137MB coherence checking
time: 0.000; rss: 137MB variance testing
time: 0.009; rss: 138MB wf checking
time: 0.009; rss: 140MB item-types checking
time: 0.366; rss: 185MB item-bodies checking
time: 0.024; rss: 185MB const checking
time: 0.002; rss: 186MB privacy checking
time: 0.001; rss: 186MB intrinsic checking
time: 0.000; rss: 186MB effect checking
time: 0.005; rss: 186MB match checking
time: 0.001; rss: 186MB liveness checking
time: 0.076; rss: 193MB borrow checking
time: 0.000; rss: 193MB reachability checking
time: 0.001; rss: 193MB death checking
time: 0.000; rss: 193MB unused lib feature checking
time: 0.011; rss: 193MB lint checking
time: 0.000; rss: 193MB resolving dependency formats
  time: 0.009; rss: 194MB       write metadata
  time: 0.569; rss: 279MB       translation item collection
  time: 0.041; rss: 298MB       codegen unit partitioning
  time: 0.022; rss: 748MB       internalize symbols
time: 6.012; rss: 748MB translation
time: 0.000; rss: 748MB assert dep graph
time: 0.000; rss: 748MB serialize dep graph
  time: 4.810; rss: 712MB       llvm function passes [0]
  time: 79.068; rss: 958MB      llvm module passes [0]
  time: 21.767; rss: 929MB      codegen passes [0]
  time: 0.001; rss: 929MB       codegen passes [0]
time: 107.035; rss: 929MB       LLVM passes
time: 0.000; rss: 929MB serialize work products

e.g. most of the time is in llvm module and codegen passes.

Cc: @froydnj @rillian

The text was updated successfully, but these errors were encountered:

alexcrichton · 2017-07-13T14:07:22Z

cc @michaelwoerister

michaelwoerister · 2017-07-13T15:56:31Z

Yes, this is one of the cases where MIR-only RLIBs would help.

michaelwoerister · 2017-07-14T16:45:13Z

I'll think some more about this next week. Maybe there's something we can do already with the compiler's current capabilities.

michaelwoerister · 2017-07-18T10:46:20Z

When linking the main crate with LTO, the rust compiler uses the metadata from the dependee's rlibs, and compiles based on that and the code in the current crate. As I understand it, at this point, all the code that was compiled and put in those rlibs is not used.

So this is not entirely true. The rlib's metadata also contains optimized LLVM bitcode and when building the main crate, this bitcode is linked into the main LLVM module for that crate. After all bitcode from all rlib dependencies is linked together into one humongous module, we let LLVM run another set of optimizations on it. This is what the LTO linker plugin would normally do.

Consequently, the optimization passes for intermediate rlibs are not really "lost", only the codegen passes, which are not cheap but also generally not as expensive as the LLVM passes before.

There is a different way of going about code generation and optimization for Rust code though. We call it MIR-only RLIBs and in this model we would generate neither LLVM IR nor machine code for RLIBs. Only when building an actual binary would the compiler instantiate the things from RLIB dependencies.

Compiling RLIBs would be massively sped up in this model, however, building the binary would also be that much slower. So it's not entirely clear that this is a win in overall build times, especially if one is mostly working on leaf crates and doesn't have to rebuild intermediate RLIBs often.

However, it's still possible that this is a win because right now we are not sharing instantiations of generic functions between crates, potentially asking the compiler to optimize the exact same code over and over again. In the MIR-only RLIB model, there would always only be one instance per leaf crate. Also, there's more room for dead code elimination because at the moment, when building an RLIB, the compiler has to assume that it will later be linked into a Rust Dylib which would export/instantiate a lot more things than an executable, staticlib, or cdylib.

@glandium When building with LTO enabled, are you mostly interested in reducing the build times for Rust developers or rather scenarios where Rust code is largely unmodified between builds?

michaelwoerister · 2017-07-18T10:53:31Z

I'm debating whether it makes sense to have kind of a "soft launch" for MIR-only RLIBs:

We modify Cargo to pass -C lto also when building RLIBs (which I think it doesn't at the moment).
When an RLIB is built with -C lto it becomes a MIR-only RLIB: No trans, no LLVM, no linking.
When a binary is built, the compiler takes care of instantiating all exported functions and statics from any MIR-only RLIBs in the dependency graph. It would be able to work with any mix of regular and MIR-only RLIBs, whether LTO is enabled or not.

One would be able to opt into the new model in a backwards compatible way.

What do you think, @alexcrichton, @brson?

alexcrichton · 2017-07-18T15:53:37Z

Sounds reasonable to me!

glandium · 2017-07-20T07:03:33Z

When building with LTO enabled, are you mostly interested in reducing the build times for Rust developers or rather scenarios where Rust code is largely unmodified between builds?

I guess both, but it's hard to have it both ways.

nnethercote · 2020-05-06T23:31:42Z

With recent changes, #70458 and #71528 in particular, rlibs created during LTO builds now only contain metadata and LLVM bitcode, but no object code. Previously they did contain object code. IIUC, that addresses the original complaint of this issue.

glandium mentioned this issue Jul 13, 2017

cargo should avoid building things it doesn't need rust-lang/cargo#4280

Open

alexcrichton added A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 13, 2017

Mark-Simulacrum added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. labels Jul 28, 2017

nnethercote closed this as completed May 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building with LTO should skip "compilation" #43212

Building with LTO should skip "compilation" #43212

glandium commented Jul 13, 2017 •

edited

Loading

alexcrichton commented Jul 13, 2017

michaelwoerister commented Jul 13, 2017

michaelwoerister commented Jul 14, 2017

michaelwoerister commented Jul 18, 2017

michaelwoerister commented Jul 18, 2017

alexcrichton commented Jul 18, 2017

glandium commented Jul 20, 2017

nnethercote commented May 6, 2020

Building with LTO should skip "compilation" #43212

Building with LTO should skip "compilation" #43212

Comments

glandium commented Jul 13, 2017 • edited Loading

alexcrichton commented Jul 13, 2017

michaelwoerister commented Jul 13, 2017

michaelwoerister commented Jul 14, 2017

michaelwoerister commented Jul 18, 2017

michaelwoerister commented Jul 18, 2017

alexcrichton commented Jul 18, 2017

glandium commented Jul 20, 2017

nnethercote commented May 6, 2020

glandium commented Jul 13, 2017 •

edited

Loading