Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to hoist the conversion of diverging calls to aborts to a layer above MIR lowering #122956

Open
saethlin opened this issue Mar 23, 2024 · 12 comments
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html E-help-wanted Call for participation: Help is requested to fix this issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@saethlin
Copy link
Member

In #122580 I added some code that prevents the magic compiler_builtins crate from linking against core. The replacement of calls happens as late as possible, so we end up with some odd behavior, where the LLVM IR and LLVM bitcode contain declarations of panic functions from core that are never called. We have those declarations because of this loop:

for &(mono_item, data) in &mono_items {
mono_item.predefine::<Builder<'_, '_, '_>>(&cx, data.linkage, data.visibility);
}

We have MonoItems for the panic functions, and no calls to them. The key point is that we have MonoItems for items that we know would be invalid to call.

Is it possible to not have those MonoItems in builds of compiler_builtins?

One possible implementation of this would be to hoist this "error or abort on upstream call" to a MIR transform that's only run when building compiler_builtins. Then maybe we could drop the logic from cg_ssa?

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 23, 2024
@saethlin saethlin added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. E-help-wanted Call for participation: Help is requested to fix this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Mar 23, 2024
@jieyouxu jieyouxu added A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Mar 23, 2024
@littlebenlittle
Copy link

Two questions here from a rustc newbie:

  1. LLVM does dead code elimination. Would this not apply here?
  2. If not, what's the harm in having dead code and how can that harm be quantified?

@saethlin
Copy link
Member Author

saethlin commented Jul 5, 2024

  1. LLVM does dead code elimination. Would this not apply here?

Only if optimizations are enabled; we want this behavior in debug builds too.

  1. If not, what's the harm in having dead code and how can that harm be quantified?

This issue we're talking on right now is about improving my PR that's linked in the original issue description, which is linked to (because it solves) this issue: #121552. The problem reported in that issue is the harm. The run-make test that's in my linked PR is a demonstration of the desired use case.

@littlebenlittle
Copy link

littlebenlittle commented Jul 6, 2024

Okay, thank you!

To check my understanding, the issue is caused because some targets link before optimizing and debug builds may not optimize at all. There are also crates (maybe only compiler_builtins?) that must be compiled before core which could still link to functions in core, even if those functions are not called in the first compilation pass. Hence dead code + no optimization = linker error. So your PR adds a check that will replace such calls with aborts, thus allowing these crates to "panic" without any dependence on core. Awesome!

My first thought is to use explicit aborts in anything that is compiled before core. This is because core doesn't exist yet and so panics don't have semantics--the programmer instructing any of these crates to panic is incorrect. My instinct is that adding logic to codegen or the MIR lowering phases would obfuscate this invariant by "magically" fixing it in a way that is transparent to the programmer. This could be a source of difficult to diagnose bugs in the future. However I have precisely zero experience maintaining a massive open-source toolchain, so please do let me know if I'm missing something here!

From my perspective the ideal is #![no_core] #29639, however it seems like this feature is still pending.

@littlebenlittle
Copy link

Okay, I re-read what I wrote above and I think I'm still a bit confused. Try 2:

In order to compile the code generator, we use functions defined in certain crates. Those crates may have other functions that depend on core, usually for panics. We know that calling any of those functions in the code generator is incorrect, so we convert those panics to aborts simply to avoid needing to link them.

So what I think I missed above is that there may be a need to refactor those crates before using #![no_core] or enforcing it in code review. Overall that seems like a good choice--clean separation of logic that needs to be compiled before core and logic that doesn't.

Again, please let me know if I'm totally off base here.

@saethlin
Copy link
Member Author

saethlin commented Jul 6, 2024

In order to compile the code generator

Nope, compiler-builtins is not a code generator. It's a low-level library, which has unique properties because it defines symbols that LLVM automatically just generates calls to. You can use it even though it's not in any dependency graph.

Code review is absolutely not a suitable replacement for automation. In this case it's especially not suitable, because no matter how good the code review is on compiler-builtins, changes to the compiler or to core change whether compiler-builtins contains linkable calls. If you follow the links you'll see that. This issue is linked to the PR #122580 which is a fix for the issue #121552 which was caused by #120594.

clean separation of logic that needs to be compiled before core

There is no such thing, really. compiler-builtins always uses code from core, that's deeply unavoidable because it wants to use all the basic language features which are implemented in core. If you look at the MIR for compiler-builtins it's absolutely riddle with use of core, but that all works out just fine so long as it doesn't need to link against core. That's the key in the PR that's linked here; it's about detecting and stubbing out or reporting an error on function calls that will require linkage. If the calls don't require linkage, they are fine. The definition of these unlinkable calls is not straightforward:

/// Returns whether a call from the current crate to the [`Instance`] would produce a call
/// from `compiler_builtins` to a symbol the linker must resolve.
///
/// Such calls from `compiler_bultins` are effectively impossible for the linker to handle. Some
/// linkers will optimize such that dead calls to unresolved symbols are not an error, but this is
/// not guaranteed. So we used this function in codegen backends to ensure we do not generate any
/// unlinkable calls.
///
/// Note that calls to LLVM intrinsics are uniquely okay because they won't make it to the linker.
pub fn is_call_from_compiler_builtins_to_upstream_monomorphization<'tcx>(
tcx: TyCtxt<'tcx>,
instance: Instance<'tcx>,
) -> bool {
fn is_llvm_intrinsic(tcx: TyCtxt<'_>, def_id: DefId) -> bool {
if let Some(name) = tcx.codegen_fn_attrs(def_id).link_name {
name.as_str().starts_with("llvm.")
} else {
false
}
}
let def_id = instance.def_id();
!def_id.is_local()
&& tcx.is_compiler_builtins(LOCAL_CRATE)
&& !is_llvm_intrinsic(tcx, def_id)
&& !should_codegen_locally(tcx, instance)
}

And then should_codegen_locally accounts for the fact that #[inline] and generic functions are compiled locally instead of being linked to.

@littlebenlittle
Copy link

littlebenlittle commented Jul 6, 2024

I'm looking at the cranelift compiler_builtins.rs. Here it seems like the purpose of this code is to map rust function calls directly to opcodes magically linked compiler functions. It's a very small file and I don't see any use of what I would think of as sophisticated language features. There is a call to std::ffi, so maybe there's an indirect dependency there?

If you look at the MIR for compiler-builtins it's absolutely riddle with use of core

I'm having trouble finding any reference to compiler-builtins for LLVM. It would be helpful to provide an example of code in the compiler-builtins crate that makes extensive use of core.

changes to the compiler or to core change whether compiler-builtins contains linkable calls

This is not clear from any of the linked discussion. Blame my lack of experience, but I'm not seeing it. Is this because compiler-builtins calls into these crates and those crates may then require linking? Or perhaps those crates need to be compiled and used to lower compiler-builtins to MIR? There seems to be two possible dependency relationships: functions calls (directly calling in the source) and compilation steps (one thing needs to be compiled before it can be used to compile another). I guess I'm not sure precisely which entities are involved and which type of dependency is causing the cycle.

@saethlin
Copy link
Member Author

saethlin commented Jul 6, 2024

compiler-builtins is here: https://github.com/rust-lang/compiler-builtins/

@saethlin
Copy link
Member Author

saethlin commented Jul 6, 2024

This is not clear from any of the linked discussion. Blame my lack of experience, but I'm not seeing it. Is this because compiler-builtins calls into these crates and those crates may then require linking?

When debug assertions are enabled, the assertions that the compiler adds contain a call to a symbol that is in core. There are also other things that we do in codegen that cause calls to the panic helpers to be inserted, which should all be unreachable, but the whole deal here is to make compiler-builtins builds with -Copt-level=0 and -Cdebug-assertions=yes link.

All the panic helpers are #[inline(never)] and monomorphic, which means that they are compiled to object code in the calling crate, then other crates which call them link to that upstream monomorphization, instead of compiling their own version of the function from MIR.

There seems to be two possible dependency relationships: functions calls (directly calling in the source) and compilation steps (one thing needs to be compiled before it can be used to compile another).

What you're getting at here is the part of the problem that I don't personally understand. The compiler-builtins crate declares a dependency on core, so what exactly is it about how the linker is invoked that means this dependency can't arise in linkage?


Or perhaps those crates need to be compiled and used to lower compiler-builtins to MIR?

The reverse of this is a potential solution. compiler-builtins is never explicitly called, so we need to compile everything in it to object code. But I wonder if it's possible to rig up something where the compiler knows to always encode MIR for core then when we compile compiler-builtins, the compiler magically knows that every function must be codegenned locally. But I suspect this doesn't help if one of those functions from core uses a static.

@littlebenlittle
Copy link

Okay, I am getting a much better picture (and kicking myself for not seeing that compiler-builtins is a different repo 🤦‍).

As you pointed out, core is a declared dependency: Cargo.toml

I see core being imported but not used in a lot places? The instrinsics! macro seems to be defined here.

Regardless, if core is used anywhere and compiler-builtins has a special use-case where linking cannot happen, this seems like a strong argument for caller-side inlining (apparently called top-down inlining). There is some discussion on that here) and one commenter points out that this isn't supported by LLVM and hence would need to be a MIR feature.

Top-down inlining seems like a fairly general solution that would solve this issue and possibly other problems, although I don't have any guess as to how difficult it would be to implement.

// if only...
#[inline_crate]
use core;

@saethlin
Copy link
Member Author

saethlin commented Jul 7, 2024

Yeah, you're describing the possible solution I was alluding to in that last paragraph. We can't apply the inlining optimization, but we can do local codegen (which is confusingly called inlining by some people and also some places in the compiler) which drops the linkage requirement, but statics must be instantiated only once, so they'll need a different solution.

@littlebenlittle
Copy link

Hmm, can you give some more background info on local codegen? I haven't seen this used anywhere before but it definitely seems useful enough to already exist.

I'm starting to get some idea of where a cycle could occur:

  1. Compile compiler-builtins, which calls into core
  2. Compile core, which may have instructions not supported by the arch and hence needs the object code from compiler-builtins

On first inspection, core can't be fully compiled to object code first because it needs compiler-builtins to be fully compiled. However compiler-builtins uses code defined in core, so it can't be compiled first, either. Womp!

Solution 0: #122580

Replace calls into core with aborts as necessary. Already well-discussed and seems to work in practice!

Solution A: Fused MIR

IIUC (and that's a big if), this is your most recent proposal. We compile core to and compiler-builtins to MIR and then inject the core code into compiler-builtins at the MIR level. Then the MIR code is passed as one big chunk to the code generation backend.

In some sense this is adding another compiler option to fuse MIR graphs before sending them to the code generator. Again IIUC, the reason rustc does not already fuse MIR is to minimize re-compilation of unchanged code, but here we have an edge-case where we really want that to prevent linking.

I think this is a pretty re-usable and elegant solution. But for completeness, here are some other thoughts:

Solution B: Abstraction Refinement

The above cycle is at the crate-level of granularity, so to resolve it we could move down an abstraction level to functions.

  1. Compile fn some_builtin. Calls into fn some_core_fn.
  2. Compile fn some_core_fn. Needs op some_other_builtin.
  3. Compile fn some_other_builtin. Calls into fn some_other_core_fn.

Yada yada yada...

Ultimately this needs to terminate. In theory proving termination is hard. In practice I can't imagine this function-level dependency being more than like 3 layers deep.

Solution C: Crate-version dependency

Instead of having core depend on the result of compiling the current version of compiler-builtins, we could use a previous version. This would require compiler-builtins to be strictly linear, i.e. only increase the number of supported instructions for every arch, and never introduce an incorrect implementation. Again in theory that's a pretty steep ask, but practically would only require re-building on the version before the incorrect implementation was introduced.

@saethlin
Copy link
Member Author

Hmm, can you give some more background info on local codegen? I haven't seen this used anywhere before but it definitely seems useful enough to already exist.

Basically, this function and all users of it:

/// Returns `true` if we should codegen an instance in the local crate, or returns `false` if we
/// can just link to the upstream crate and therefore don't need a mono item.
pub(crate) fn should_codegen_locally<'tcx>(tcx: TyCtxt<'tcx>, instance: Instance<'tcx>) -> bool {

That's not an easy set of things to track down, but the monomorphization system in rustc is definitely one of the worst-maintained parts. The core idea is that some items are compiled in the crate that uses them, not the crate that defines them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html E-help-wanted Call for participation: Help is requested to fix this issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants