-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc: Enable LTO and multiple codegen units #44783
Conversation
r? @arielb1 (rust_highfive has picked a reviewer for you, use r? to override) |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to storing and loading bitcode in rlibs LGTM. I don't know enough about the whole work items and job server stuff to review the rest, though I did skim it.
src/librustc_trans/back/bytecode.rs
Outdated
//! elsewhere, so we currently compress the bytecode via deflate to avoid taking | ||
//! up too much space on disk. | ||
//! | ||
//! After compressing the bytcode we then have the rest of the format to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: s/bytcode/bytecode/
src/librustc_trans/back/bytecode.rs
Outdated
//! up too much space on disk. | ||
//! | ||
//! After compressing the bytcode we then have the rest of the format to | ||
//! basically deal with various bugs in various archive implemenatations. THe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: s/THe/The/
src/librustc_trans/back/link.rs
Outdated
@@ -1310,7 +1225,7 @@ fn add_upstream_rust_crates(cmd: &mut Linker, | |||
archive.update_symbols(); | |||
|
|||
for f in archive.src_files() { | |||
if f.ends_with("bytecode.deflate") || f == METADATA_FILENAME { | |||
if f.ends_with("bytecode.encoded") || f == METADATA_FILENAME { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're already updating all the places that involve this magic suffix, maybe pull the string out into a constant?
src/librustc_trans/back/lto.rs
Outdated
llvm::LLVMRustRunRestrictionPass(llmod, | ||
ptr as *const *const libc::c_char, | ||
arr.len() as libc::size_t); | ||
symbol_white_list.len() as libc::size_t); | ||
cgcx.save_temp_bitcode(&module, "lto.after-restrictoin"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: s/restrictoin/restriction/
src/rustllvm/RustWrapper.cpp
Outdated
|
||
std::string Err; | ||
raw_string_ostream Stream(Err); | ||
DiagnosticPrinterRawOStream DP(Stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unused?
return Err(format!("bytecode corrupted")) | ||
} | ||
let identifier_len = unsafe { | ||
u32::from_le(ptr::read_unaligned(data.as_ptr() as *const u32)) as usize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish this code used byteorder. Oh well, a refactoring for another day.
1077714
to
6ea1259
Compare
Nice! I'll review this after the weekend. |
// option. This file may not be copied, modified, or distributed | ||
// except according to those terms. | ||
|
||
// compile-flags: -C lto -C codegen-units=8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
[00:49:33] ---- [run-pass] run-pass/lto-many-codegen-units.rs stdout ----
[00:49:33]
[00:49:33] error: compilation failed!
[00:49:33] status: exit code: 101
[00:49:33] command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/run-pass/lto-many-codegen-units.rs" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-pass" "--target=x86_64-unknown-linux-gnu" "-C" "prefer-dynamic" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-pass/lto-many-codegen-units.stage2-x86_64-unknown-linux-gnu" "-Crpath" "-O" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-C" "lto" "-C" "codegen-units=8" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/run-pass/lto-many-codegen-units.stage2-x86_64-unknown-linux-gnu.run-pass.libaux"
[00:49:33] stdout:
[00:49:33] ------------------------------------------
[00:49:33]
[00:49:33] ------------------------------------------
[00:49:33] stderr:
[00:49:33] ------------------------------------------
[00:49:33] error: cannot prefer dynamic linking when performing LTO
[00:49:33]
[00:49:33] note: only 'staticlib', 'bin', and 'cdylib' outputs are supported with LTO
[00:49:33]
[00:49:33] error: aborting due to previous error
[00:49:33]
[00:49:33] thread '<unnamed>' panicked at 'Box<Any>', /checkout/src/librustc_trans/back/write.rs:521:65
[00:49:33]
[00:49:33] ------------------------------------------
[00:49:33]
[00:49:33] thread '[run-pass] run-pass/lto-many-codegen-units.rs' panicked at 'explicit panic', /checkout/src/tools/compiletest/src/runtest.rs:2433:8
[00:49:33] note: Run with `RUST_BACKTRACE=1` for a backtrace.
[00:49:33]
[00:49:33]
[00:49:33] failures:
[00:49:33] [run-pass] run-pass/lto-many-codegen-units.rs
[00:49:33]
[00:49:33] test result: FAILED. 2762 passed; 1 failed; 8 ignored; 0 measured; 0 filtered out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks @alexcrichton!
r=me with the comments addressed.
src/librustc_trans/back/bytecode.rs
Outdated
//! up too much space on disk. | ||
//! | ||
//! After compressing the bytecode we then have the rest of the format to | ||
//! basically deal with various bugs in various archive implemenatations. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implementations
src/librustc_trans/back/bytecode.rs
Outdated
|
||
// Next is the LLVM module deflate compressed, prefixed with its length. We | ||
// don't know its length yet, so fill in 0s | ||
let deflated_size = encoded.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the binding here is a bit misleading. Could you change that to something like deflated_size_pos
?
// bitcode. All modules were translated in their own LLVM context, however, | ||
// and we want to move everything to the same LLVM context. Currently the | ||
// way we know of to do that is to serialize them to a string and them parse | ||
// them later. Not great but hey, that's why it's "fat" LTO, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this won't be the most performant thing to do but my guess is that it's still cheap compared to optimizing the resulting mega-module. Plus, it's a corner case anyway. So: 👍
@@ -1304,6 +1365,8 @@ fn start_executing_work(tcx: TyCtxt, | |||
let mut compiled_modules = vec![]; | |||
let mut compiled_metadata_module = None; | |||
let mut compiled_allocator_module = None; | |||
let mut needs_lto = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would mind updating the big comment above with how LTO fits into the picture. It's not immediately clear to me what work gets executed when and where -- e.g. it seems that a whole bunch of work gets done on the scheduler thread via generate_lto_work
. That deserves to be mentioned, I think.
// this as we're not working with this dual "rlib/dylib" functionality. | ||
let allocator_module = if tcx.sess.lto() { | ||
None | ||
} else if let Some(kind) = tcx.sess.allocator_kind.get() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's nice that we can get rid of the special casing here. If it's easy, it would be nice to make sure we don't serialize the big "actual-code-module" and merge that into the tiny allocator module. Although it's a special case to even have an allocator module, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I'll just switch LTO to merging everything into the "costliest" module
6ea1259
to
bf3be56
Compare
@bors: r=michaelwoerister |
📌 Commit bf3be56 has been approved by |
src/rustllvm/RustWrapper.cpp
Outdated
LLVMRustModuleCost(LLVMModuleRef M) { | ||
Module &Mod = *unwrap(M); | ||
uint64_t cost = 0; | ||
for (auto &GO : Mod.global_objects()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot build rustllvm
, at least for the CI (LLVM 3.7):
[00:04:04] cargo:warning=../rustllvm/RustWrapper.cpp: In function 'uint64_t LLVMRustModuleCost(LLVMModuleRef)':
[00:04:04] cargo:warning=../rustllvm/RustWrapper.cpp:1462:23: error: 'class llvm::Module' has no member named 'global_objects'
[00:04:04] cargo:warning= for (auto &GO : Mod.global_objects()) {
[00:04:04] cargo:warning= ^
[00:04:04] exit code: 1
@bors: r- |
bf3be56
to
6154e04
Compare
@bors: r=michaelwoerister |
📌 Commit 6154e04 has been approved by |
☔ The latest upstream changes (presumably #44085) made this pull request unmergeable. Please resolve the merge conflicts. |
6154e04
to
c3c6592
Compare
@bors: r=michaelwoerister |
📌 Commit c3c6592 has been approved by |
CI is still failing, not sure if legit.
|
@bors: r- definitely legit |
c3c6592
to
3547f2d
Compare
@bors: r=michaelwoerister |
📌 Commit 3547f2d has been approved by |
⌛ Testing commit 3547f2d8e3cf4ab63041b7463046a6d9d893205a with merge 3d3506dc9f13c3dc72ceb531e62011af86774cbe... |
💔 Test failed - status-travis |
encoded[deflated_size_pos + 4] = (bytecode_len >> 32) as u8; | ||
encoded[deflated_size_pos + 5] = (bytecode_len >> 40) as u8; | ||
encoded[deflated_size_pos + 6] = (bytecode_len >> 48) as u8; | ||
encoded[deflated_size_pos + 7] = (bytecode_len >> 56) as u8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytecode_len
is an usize
. It should be cast to u64
, or the last 4 statements should be #[cfg]
'ed out.
Failed on `i686-gnu`:
[00:19:51] error: bitshift exceeds the type's number of bits
[00:19:51] --> /checkout/src/librustc_trans/back/bytecode.rs:88:38
[00:19:51] |
[00:19:51] 88 | encoded[deflated_size_pos + 4] = (bytecode_len >> 32) as u8;
[00:19:51] | ^^^^^^^^^^^^^^^^^^^^
[00:19:51] |
[00:19:51] = note: #[deny(exceeding_bitshifts)] on by default
[00:19:51]
[00:19:51] error: bitshift exceeds the type's number of bits
[00:19:51] --> /checkout/src/librustc_trans/back/bytecode.rs:89:38
[00:19:51] |
[00:19:51] 89 | encoded[deflated_size_pos + 5] = (bytecode_len >> 40) as u8;
[00:19:51] | ^^^^^^^^^^^^^^^^^^^^
[00:19:51]
[00:19:51] error: bitshift exceeds the type's number of bits
[00:19:51] --> /checkout/src/librustc_trans/back/bytecode.rs:90:38
[00:19:51] |
[00:19:51] 90 | encoded[deflated_size_pos + 6] = (bytecode_len >> 48) as u8;
[00:19:51] | ^^^^^^^^^^^^^^^^^^^^
[00:19:51]
[00:19:51] error: bitshift exceeds the type's number of bits
[00:19:51] --> /checkout/src/librustc_trans/back/bytecode.rs:91:38
[00:19:51] |
[00:19:51] 91 | encoded[deflated_size_pos + 7] = (bytecode_len >> 56) as u8;
[00:19:51] | ^^^^^^^^^^^^^^^^^^^^
[00:19:51]
[00:19:51] error: aborting due to 4 previous errors
[00:19:51]
[00:19:51] error: Could not compile `rustc_trans`.
3547f2d
to
b377366
Compare
@bors: r=michaelwoerister |
📌 Commit b377366 has been approved by |
⌛ Testing commit b377366f7bb60653cb10b56ddd61313aca67906d with merge e1e340a524902f341944f9c68ef57b6aa69c4b39... |
💔 Test failed - status-travis |
The LTO run-pass tests seg-faulted on
|
I spy.... at least one use after free |
b377366
to
c077360
Compare
@bors: r=michaelwoerister |
📌 Commit c077360 has been approved by |
⌛ Testing commit c077360147a7fe2cc03dfdca647f7c7b95614284 with merge 0b4c7760c55425321b2944ebfa6fc4f7c0fe1884... |
💔 Test failed - status-travis |
The compile-fail tests involving ASM did not fail in
|
☔ The latest upstream changes (presumably #44853) made this pull request unmergeable. Please resolve the merge conflicts. |
This commit is a refactoring of the LTO backend in Rust to support compilations with multiple codegen units. The immediate result of this PR is to remove the artificial error emitted by rustc about `-C lto -C codegen-units-8`, but longer term this is intended to lay the groundwork for LTO with incremental compilation and ultimately be the underpinning of ThinLTO support. The problem here that needed solving is that when rustc is producing multiple codegen units in one compilation LTO needs to merge them all together. Previously only upstream dependencies were merged and it was inherently relied on that there was only one local codegen unit. Supporting this involved refactoring the optimization backend architecture for rustc, namely splitting the `optimize_and_codegen` function into `optimize` and `codegen`. After an LLVM module has been optimized it may be blocked and queued up for LTO, and only after LTO are modules code generated. Non-LTO compilations should look the same as they do today backend-wise, we'll spin up a thread for each codegen unit and optimize/codegen in that thread. LTO compilations will, however, send the LLVM module back to the coordinator thread once optimizations have finished. When all LLVM modules have finished optimizing the coordinator will invoke the LTO backend, producing a further list of LLVM modules. Currently this is always a list of one LLVM module. The coordinator then spawns further work to run LTO and code generation passes over each module. In the course of this refactoring a number of other pieces were refactored: * Management of the bytecode encoding in rlibs was centralized into one module instead of being scattered across LTO and linking. * Some internal refactorings on the link stage of the compiler was done to work directly from `CompiledModule` structures instead of lists of paths. * The trans time-graph output was tweaked a little to include a name on each bar and inflate the size of the bars a little
c077360
to
ded38db
Compare
@bors: r=michaelwoerister |
📌 Commit ded38db has been approved by |
…ster rustc: Enable LTO and multiple codegen units This commit is a refactoring of the LTO backend in Rust to support compilations with multiple codegen units. The immediate result of this PR is to remove the artificial error emitted by rustc about `-C lto -C codegen-units-8`, but longer term this is intended to lay the groundwork for LTO with incremental compilation and ultimately be the underpinning of ThinLTO support. The problem here that needed solving is that when rustc is producing multiple codegen units in one compilation LTO needs to merge them all together. Previously only upstream dependencies were merged and it was inherently relied on that there was only one local codegen unit. Supporting this involved refactoring the optimization backend architecture for rustc, namely splitting the `optimize_and_codegen` function into `optimize` and `codegen`. After an LLVM module has been optimized it may be blocked and queued up for LTO, and only after LTO are modules code generated. Non-LTO compilations should look the same as they do today backend-wise, we'll spin up a thread for each codegen unit and optimize/codegen in that thread. LTO compilations will, however, send the LLVM module back to the coordinator thread once optimizations have finished. When all LLVM modules have finished optimizing the coordinator will invoke the LTO backend, producing a further list of LLVM modules. Currently this is always a list of one LLVM module. The coordinator then spawns further work to run LTO and code generation passes over each module. In the course of this refactoring a number of other pieces were refactored: * Management of the bytecode encoding in rlibs was centralized into one module instead of being scattered across LTO and linking. * Some internal refactorings on the link stage of the compiler was done to work directly from `CompiledModule` structures instead of lists of paths. * The trans time-graph output was tweaked a little to include a name on each bar and inflate the size of the bars a little
☀️ Test successful - status-appveyor, status-travis |
This was actually merged. |
This commit is a refactoring of the LTO backend in Rust to support compilations
with multiple codegen units. The immediate result of this PR is to remove the
artificial error emitted by rustc about
-C lto -C codegen-units-8
, but longerterm this is intended to lay the groundwork for LTO with incremental compilation
and ultimately be the underpinning of ThinLTO support.
The problem here that needed solving is that when rustc is producing multiple
codegen units in one compilation LTO needs to merge them all together.
Previously only upstream dependencies were merged and it was inherently relied
on that there was only one local codegen unit. Supporting this involved
refactoring the optimization backend architecture for rustc, namely splitting
the
optimize_and_codegen
function intooptimize
andcodegen
. After an LLVMmodule has been optimized it may be blocked and queued up for LTO, and only
after LTO are modules code generated.
Non-LTO compilations should look the same as they do today backend-wise, we'll
spin up a thread for each codegen unit and optimize/codegen in that thread. LTO
compilations will, however, send the LLVM module back to the coordinator thread
once optimizations have finished. When all LLVM modules have finished optimizing
the coordinator will invoke the LTO backend, producing a further list of LLVM
modules. Currently this is always a list of one LLVM module. The coordinator
then spawns further work to run LTO and code generation passes over each module.
In the course of this refactoring a number of other pieces were refactored:
instead of being scattered across LTO and linking.
directly from
CompiledModule
structures instead of lists of paths.bar and inflate the size of the bars a little