Make sure our handling of partially initialized values is compatible with LLVM / We cannot use LLVM poison semantics #94428
Labels
A-codegen
Area: Code generation
A-LLVM
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
A-MIR
Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
T-opsem
Relevant to the opsem team
The intent is for types like
MaybeUninit<u64>
to support dealing with partially initialized data: e.g., if we have a(u32, u16)
(and assuming for a second we could rely on its layout), it should be sound to transmute that toMaybeUninit<u64>
and back even though the padding between the two tuple fields might be uninitialized. Code like #94212 relies on this.The thing is, we are compiling
MaybeUninit<u64>
toi64
for LLVM --MaybeUninit
isrepr(transparent)
. This was required to avoid codegen regressions whenMaybeUninit
started to be used in some hot data copying loops inside libcore. So, for this all to work out, we better be sure thati64
correctly preserves partially initialized data.LLVM has two kinds of "uninit" data,
undef
andpoison
.undef
is per-bit and precisely preserved in alliN
types, so we should be fine here.poison
, however, is per-value: when loading ani64
and any of its bytes ispoison
, the entire result ispoison
. That is exactly not what we want forMaybeUninit<u64>
. However, at least in current LLVM,poison
is only created in very few situations (such as "nowrap" arithmetic that overflows), and AFAIK none of them can happen in a UB-free Rust program -- so, basically "uninit" in Rust only ever corresponds toundef
in LLVM, never topoison
. (But I might have missed places where LLVM generatesposion
.)So I think right now we are good. However, LLVM is slowly moving away from
undef
and towardsposion
, sinceundef
is seriously ill-behaved in many ways. And if that ever means that "uninit" in Rust could correspond to LLVMpoison
, then we have a problem here -- we have to keep monitoring this situation, and it might be good for us to be involved in the relevant LLVM discussions here as well to make sure they are aware of this problem.Similarly, as we evolve the MIR semantics we have to make sure that no UB-free program can generate
poison
after compilation to LLVM.A very elegant solution to this issue would be for LLVM to adopt the "byte type" proposal, however, so far my impression is the LLVM community is not convinced they need such a type. With a byte type,
MaybeUninit<u64>
could be easily compiled tob64
in LLVM, and a byte type would preservepoison
precisely, so we'd be all good.I am mostly opening this so we have some place to track the current situation, and to make sure everyone agrees on what the main concerns are here -- and to get input from folks with more LLVM experience in case I got some of this wrong.
Cc @rust-lang/wg-unsafe-code-guidelines @rust-lang/wg-llvm
The text was updated successfully, but these errors were encountered: