Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc: Prepare to enable ThinLTO by default #46382

Merged
merged 3 commits into from
Dec 3, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 15 additions & 3 deletions src/librustc/session/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -383,8 +383,13 @@ top_level_options!(
// try to not rely on this too much.
actually_rustdoc: bool [TRACKED],

// Number of object files/codegen units to produce on the backend
// Specifications of codegen units / ThinLTO which are forced as a
// result of parsing command line options. These are not necessarily
// what rustc was invoked with, but massaged a bit to agree with
// commands like `--emit llvm-ir` which they're often incompatible with
// if we otherwise use the defaults of rustc.
cli_forced_codegen_units: Option<usize> [UNTRACKED],
cli_forced_thinlto: Option<bool> [UNTRACKED],
}
);

Expand Down Expand Up @@ -566,6 +571,7 @@ pub fn basic_options() -> Options {
debug_assertions: true,
actually_rustdoc: false,
cli_forced_codegen_units: None,
cli_forced_thinlto: None,
}
}

Expand Down Expand Up @@ -1165,7 +1171,7 @@ options! {DebuggingOptions, DebuggingSetter, basic_debugging_options,
"run the non-lexical lifetimes MIR pass"),
trans_time_graph: bool = (false, parse_bool, [UNTRACKED],
"generate a graphical HTML report of time spent in trans and LLVM"),
thinlto: bool = (false, parse_bool, [TRACKED],
thinlto: Option<bool> = (None, parse_opt_bool, [TRACKED],
"enable ThinLTO when possible"),
inline_in_all_cgus: Option<bool> = (None, parse_opt_bool, [TRACKED],
"control whether #[inline] functions are in all cgus"),
Expand Down Expand Up @@ -1601,6 +1607,7 @@ pub fn build_session_options_and_crate_config(matches: &getopts::Matches)

let mut cg = build_codegen_options(matches, error_format);
let mut codegen_units = cg.codegen_units;
let mut thinlto = None;

// Issue #30063: if user requests llvm-related output to one
// particular path, disable codegen-units.
Expand All @@ -1622,9 +1629,13 @@ pub fn build_session_options_and_crate_config(matches: &getopts::Matches)
}
early_warn(error_format, "resetting to default -C codegen-units=1");
codegen_units = Some(1);
thinlto = Some(false);
}
}
_ => codegen_units = Some(1),
_ => {
codegen_units = Some(1);
thinlto = Some(false);
}
}
}

Expand Down Expand Up @@ -1834,6 +1845,7 @@ pub fn build_session_options_and_crate_config(matches: &getopts::Matches)
debug_assertions,
actually_rustdoc: false,
cli_forced_codegen_units: codegen_units,
cli_forced_thinlto: thinlto,
},
cfg)
}
Expand Down
103 changes: 82 additions & 21 deletions src/librustc/session/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -656,30 +656,91 @@ impl Session {
return n as usize
}

// Why is 16 codegen units the default all the time?
//
// The main reason for enabling multiple codegen units by default is to
// leverage the ability for the trans backend to do translation and
// codegen in parallel. This allows us, especially for large crates, to
// make good use of all available resources on the machine once we've
// hit that stage of compilation. Large crates especially then often
// take a long time in trans/codegen and this helps us amortize that
// cost.
//
// Note that a high number here doesn't mean that we'll be spawning a
// large number of threads in parallel. The backend of rustc contains
// global rate limiting through the `jobserver` crate so we'll never
// overload the system with too much work, but rather we'll only be
// optimizing when we're otherwise cooperating with other instances of
// rustc.
//
// Rather a high number here means that we should be able to keep a lot
// of idle cpus busy. By ensuring that no codegen unit takes *too* long
// to build we'll be guaranteed that all cpus will finish pretty closely
// to one another and we should make relatively optimal use of system
// resources
//
// Note that the main cost of codegen units is that it prevents LLVM
// from inlining across codegen units. Users in general don't have a lot
// of control over how codegen units are split up so it's our job in the
// compiler to ensure that undue performance isn't lost when using
// codegen units (aka we can't require everyone to slap `#[inline]` on
// everything).
//
// If we're compiling at `-O0` then the number doesn't really matter too
// much because performance doesn't matter and inlining is ok to lose.
// In debug mode we just want to try to guarantee that no cpu is stuck
// doing work that could otherwise be farmed to others.
//
// In release mode, however (O1 and above) performance does indeed
// matter! To recover the loss in performance due to inlining we'll be
// enabling ThinLTO by default (the function for which is just below).
// This will ensure that we recover any inlining wins we otherwise lost
// through codegen unit partitioning.
//
// ---
//
// Ok that's a lot of words but the basic tl;dr; is that we want a high
// number here -- but not too high. Additionally we're "safe" to have it
// always at the same number at all optimization levels.
//
// As a result 16 was chosen here! Mostly because it was a power of 2
// and most benchmarks agreed it was roughly a local optimum. Not very
// scientific.
match self.opts.optimize {
// If we're compiling at `-O0` then default to 16 codegen units.
// The number here shouldn't matter too too much as debug mode
// builds don't rely on performance at all, meaning that lost
// opportunities for inlining through multiple codegen units is
// a non-issue.
//
// Note that the high number here doesn't mean that we'll be
// spawning a large number of threads in parallel. The backend
// of rustc contains global rate limiting through the
// `jobserver` crate so we'll never overload the system with too
// much work, but rather we'll only be optimizing when we're
// otherwise cooperating with other instances of rustc.
//
// Rather the high number here means that we should be able to
// keep a lot of idle cpus busy. By ensuring that no codegen
// unit takes *too* long to build we'll be guaranteed that all
// cpus will finish pretty closely to one another and we should
// make relatively optimal use of system resources
config::OptLevel::No => 16,
_ => 1, // FIXME(#46346) this should be 16
}
}

// All other optimization levels default use one codegen unit,
// the historical default in Rust for a Long Time.
_ => 1,
/// Returns whether ThinLTO is enabled for this compilation
pub fn thinlto(&self) -> bool {
// If processing command line options determined that we're incompatible
// with ThinLTO (e.g. `-C lto --emit llvm-ir`) then return that option.
if let Some(enabled) = self.opts.cli_forced_thinlto {
return enabled
}

// If explicitly specified, use that with the next highest priority
if let Some(enabled) = self.opts.debugging_opts.thinlto {
return enabled
}

// If there's only one codegen unit and LTO isn't enabled then there's
// no need for ThinLTO so just return false.
if self.codegen_units() == 1 && !self.lto() {
return false
}

// Right now ThinLTO isn't compatible with incremental compilation.
if self.opts.incremental.is_some() {
return false
}

// Now we're in "defaults" territory. By default we enable ThinLTO for
// optimized compiles (anything greater than O0).
match self.opts.optimize {
config::OptLevel::No => false,
_ => true,
}
}
}
Expand Down
5 changes: 3 additions & 2 deletions src/librustc_trans/back/write.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1402,8 +1402,9 @@ fn start_executing_work(tcx: TyCtxt,
// for doesn't require full LTO. Some targets require one LLVM module
// (they effectively don't have a linker) so it's up to us to use LTO to
// link everything together.
thinlto: sess.opts.debugging_opts.thinlto &&
!sess.target.target.options.requires_lto,
thinlto: sess.thinlto() &&
!sess.target.target.options.requires_lto &&
unsafe { llvm::LLVMRustThinLTOAvailable() },

no_landing_pads: sess.no_landing_pads(),
save_temps: sess.opts.cg.save_temps,
Expand Down
2 changes: 1 addition & 1 deletion src/librustc_trans/base.rs
Original file line number Diff line number Diff line change
Expand Up @@ -706,7 +706,7 @@ pub fn trans_crate<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>,

check_for_rustc_errors_attr(tcx);

if tcx.sess.opts.debugging_opts.thinlto {
if let Some(true) = tcx.sess.opts.debugging_opts.thinlto {
if unsafe { !llvm::LLVMRustThinLTOAvailable() } {
tcx.sess.fatal("this compiler's LLVM does not support ThinLTO");
}
Expand Down
22 changes: 20 additions & 2 deletions src/libstd/sys_common/backtrace.rs
Original file line number Diff line number Diff line change
Expand Up @@ -252,8 +252,26 @@ fn output_fileline(w: &mut Write,
// Note that this demangler isn't quite as fancy as it could be. We have lots
// of other information in our symbols like hashes, version, type information,
// etc. Additionally, this doesn't handle glue symbols at all.
pub fn demangle(writer: &mut Write, s: &str, format: PrintFormat) -> io::Result<()> {
// First validate the symbol. If it doesn't look like anything we're
pub fn demangle(writer: &mut Write, mut s: &str, format: PrintFormat) -> io::Result<()> {
// During ThinLTO LLVM may import and rename internal symbols, so strip out
// those endings first as they're one of the last manglings applied to
// symbol names.
let llvm = ".llvm.";
if let Some(i) = s.find(llvm) {
let candidate = &s[i + llvm.len()..];
let all_hex = candidate.chars().all(|c| {
match c {
'A' ... 'F' | '0' ... '9' => true,
_ => false,
}
});

if all_hex {
s = &s[..i];
}
}

// Validate the symbol. If it doesn't look like anything we're
// expecting, we just print it literally. Note that we must handle non-rust
// symbols because we could have any function in the backtrace.
let mut valid = true;
Expand Down
110 changes: 23 additions & 87 deletions src/rustllvm/PassWrapper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include <stdio.h>

#include <vector>
#include <set>

#include "rustllvm.h"

Expand Down Expand Up @@ -885,86 +886,6 @@ getFirstDefinitionForLinker(const GlobalValueSummaryList &GVSummaryList) {
return FirstDefForLinker->get();
}

// This is a helper function we added that isn't present in LLVM's source.
//
// The way LTO works in Rust is that we typically have a number of symbols that
// we know ahead of time need to be preserved. We want to ensure that ThinLTO
// doesn't accidentally internalize any of these and otherwise is always
// ready to keep them linking correctly.
//
// This function will recursively walk the `GUID` provided and all of its
// references, as specified in the `Index`. In other words, we're taking a
// `GUID` as input, adding it to `Preserved`, and then taking all `GUID`
// items that the input references and recursing.
static void
addPreservedGUID(const ModuleSummaryIndex &Index,
DenseSet<GlobalValue::GUID> &Preserved,
GlobalValue::GUID GUID) {
if (Preserved.count(GUID))
return;
Preserved.insert(GUID);

#if LLVM_VERSION_GE(5, 0)
auto Info = Index.getValueInfo(GUID);
if (!Info) {
return;
}
for (auto &Summary : Info.getSummaryList()) {
for (auto &Ref : Summary->refs()) {
addPreservedGUID(Index, Preserved, Ref.getGUID());
}

GlobalValueSummary *GVSummary = Summary.get();
if (isa<FunctionSummary>(GVSummary)) {
auto *FS = cast<FunctionSummary>(GVSummary);
for (auto &Call: FS->calls()) {
addPreservedGUID(Index, Preserved, Call.first.getGUID());
}
for (auto &GUID: FS->type_tests()) {
addPreservedGUID(Index, Preserved, GUID);
}
}
if (isa<AliasSummary>(GVSummary)) {
auto *AS = cast<AliasSummary>(GVSummary);
auto GUID = AS->getAliasee().getOriginalName();
addPreservedGUID(Index, Preserved, GUID);
}
}
#else
auto SummaryList = Index.findGlobalValueSummaryList(GUID);
if (SummaryList == Index.end())
return;
for (auto &Summary : SummaryList->second) {
for (auto &Ref : Summary->refs()) {
if (Ref.isGUID()) {
addPreservedGUID(Index, Preserved, Ref.getGUID());
} else {
auto Value = Ref.getValue();
addPreservedGUID(Index, Preserved, Value->getGUID());
}
}

if (auto *FS = dyn_cast<FunctionSummary>(Summary.get())) {
for (auto &Call: FS->calls()) {
if (Call.first.isGUID()) {
addPreservedGUID(Index, Preserved, Call.first.getGUID());
} else {
auto Value = Call.first.getValue();
addPreservedGUID(Index, Preserved, Value->getGUID());
}
}
for (auto &GUID: FS->type_tests()) {
addPreservedGUID(Index, Preserved, GUID);
}
}
if (auto *AS = dyn_cast<AliasSummary>(Summary.get())) {
auto GUID = AS->getAliasee().getOriginalName();
addPreservedGUID(Index, Preserved, GUID);
}
}
#endif
}

// The main entry point for creating the global ThinLTO analysis. The structure
// here is basically the same as before threads are spawned in the `run`
// function of `lib/LTO/ThinLTOCodeGenerator.cpp`.
Expand Down Expand Up @@ -1004,12 +925,10 @@ LLVMRustCreateThinLTOData(LLVMRustThinLTOModule *modules,
Ret->Index.collectDefinedGVSummariesPerModule(Ret->ModuleToDefinedGVSummaries);

// Convert the preserved symbols set from string to GUID, this is then needed
// for internalization. We use `addPreservedGUID` to include any transitively
// used symbol as well.
// for internalization.
for (int i = 0; i < num_symbols; i++) {
addPreservedGUID(Ret->Index,
Ret->GUIDPreservedSymbols,
GlobalValue::getGUID(preserved_symbols[i]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this too conservative before?

auto GUID = GlobalValue::getGUID(preserved_symbols[i]);
Ret->GUIDPreservedSymbols.insert(GUID);
}

// Collect the import/export lists for all modules from the call-graph in the
Expand Down Expand Up @@ -1038,7 +957,8 @@ LLVMRustCreateThinLTOData(LLVMRustThinLTOModule *modules,
// Resolve LinkOnce/Weak symbols, this has to be computed early be cause it
// impacts the caching.
//
// This is copied from `lib/LTO/ThinLTOCodeGenerator.cpp`
// This is copied from `lib/LTO/ThinLTOCodeGenerator.cpp` with some of this
// being lifted from `lib/LTO/LTO.cpp` as well
StringMap<std::map<GlobalValue::GUID, GlobalValue::LinkageTypes>> ResolvedODR;
DenseMap<GlobalValue::GUID, const GlobalValueSummary *> PrevailingCopy;
for (auto &I : Ret->Index) {
Expand All @@ -1062,11 +982,27 @@ LLVMRustCreateThinLTOData(LLVMRustThinLTOModule *modules,
ResolvedODR[ModuleIdentifier][GUID] = NewLinkage;
};
thinLTOResolveWeakForLinkerInIndex(Ret->Index, isPrevailing, recordNewLinkage);

// Here we calculate an `ExportedGUIDs` set for use in the `isExported`
// callback below. This callback below will dictate the linkage for all
// summaries in the index, and we basically just only want to ensure that dead
// symbols are internalized. Otherwise everything that's already external
// linkage will stay as external, and internal will stay as internal.
std::set<GlobalValue::GUID> ExportedGUIDs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a std::unordered_set here.

for (auto &List : Ret->Index) {
for (auto &GVS: List.second) {
if (!GlobalValue::isExternalLinkage(GVS->linkage()))
continue;
auto GUID = GVS->getOriginalName();
if (!DeadSymbols.count(GUID))
ExportedGUIDs.insert(GUID);
}
}
auto isExported = [&](StringRef ModuleIdentifier, GlobalValue::GUID GUID) {
const auto &ExportList = Ret->ExportLists.find(ModuleIdentifier);
return (ExportList != Ret->ExportLists.end() &&
ExportList->second.count(GUID)) ||
Ret->GUIDPreservedSymbols.count(GUID);
ExportedGUIDs.count(GUID);
};
thinLTOInternalizeAndPromoteInIndex(Ret->Index, isExported);

Expand Down
2 changes: 1 addition & 1 deletion src/test/run-fail/mir_trans_no_landing_pads.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.

// compile-flags: -Z no-landing-pads
// compile-flags: -Z no-landing-pads -C codegen-units=1
// error-pattern:converging_fn called
use std::io::{self, Write};

Expand Down
2 changes: 1 addition & 1 deletion src/test/run-fail/mir_trans_no_landing_pads_diverging.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
// option. This file may not be copied, modified, or distributed
// except according to those terms.

// compile-flags: -Z no-landing-pads
// compile-flags: -Z no-landing-pads -C codegen-units=1
// error-pattern:diverging_fn called
use std::io::{self, Write};

Expand Down
Loading