Revert #94158, "Apply noundef metadata to loads of types that do not permit raw init" #98966

saethlin · 2022-07-06T01:33:56Z

In #94158, we started emitting noundef, which means that functions returning uninitialized references emit IR with is both ret void and also noundef: https://godbolt.org/z/hbjsKKfc3
More generally, this change makes mem::uninitialized dangerous in a way that it wasn't before. This noundef change was shipped in the past 2 stable releases, 1.61.0 and 1.62.0.

This concerns me because in #66151 we have thus far decided against issuing a panic for creating uninitialized references within arrays, on account of the breakage that shows up in crater. If this pattern is so widely-used that we're not willing to put in a runtime check for it, then it doesn't seem prudent to invite LLVM to exploit this UB.

The pattern of creating uninit references in arrays shows up real code because the 0.11, 0.12, and 0.13 release series for hyper all use mem::uninitialized to construct arrays of httparse::Header, which contains a &str and &[u8]. There is no patch available within these release series, so a cargo update is not a viable way to escape the UB here. Also note that the 0.11 release series of hyper predates MaybeUninit, so any source-level patch for that will incur runtime overhead. Which would be unfortunate, but preferable to UB.

I discussed this with @thomcc on the community Discord, and we think that it would be prudent to revert this introduction of noundef until we have a mitigation in place for the UB that this may unleash. We haven't been able to cook up any examples of surprising optimizations due to this pattern, but the whole point of noundef is to enable optimizations, and we know that there is code which uses it in a way which is definitely instant UB and which we have declined to inform users of.

If possible, we would like to see freeze applied to the return value of mem::uninitialized as a mitigation for this problem. That may be able to keep old code functioning without introducing a performance hit.

@rustbot labels add +T-compiler +I-compiler-nominated

@thomcc

…t do not permit raw init" In rust-lang#94158, we started emitting `noundef`, which means that functions returning uninitialized references emit IR with is both `ret void` and also `noundef`: https://godbolt.org/z/hbjsKKfc3 More generally, this change makes `mem::uninitialized` dangerous in a way that it wasn't before. This `noundef` change was shipped in the past 2 stable releases, 1.61.0 and 1.62.0. This concerns me because in rust-lang#66151 we have thus far decided against issuing a panic for creating uninitialized references within arrays, on account of the breakage that shows up in crater. If this pattern is so widely-used that we're not willing to put in a runtime check for it, then it doesn't seem prudent to invite LLVM to exploit this UB. The pattern of creating uninit references in arrays shows up real code because the 0.11, 0.12, and 0.13 release series for `hyper` all use `mem::uninitialized` to construct arrays of `httparse::Header`, which contains a `&str` and `&[u8]`. There is no patch available within these release series, so a `cargo update` is not a viable way to escape the UB here. Also note that the 0.11 release series of `hyper` predates `MaybeUninit`, so any source-level patch for that will incur runtime overhead. Which would be unfortunate, but preferable to UB. I discussed this with @thomcc on the community Discord, and we think that it would be prudent to revert this introduction of `noundef` until we have a mitigation in place for the UB that this may unleash. We haven't been able to cook up any examples of surprising optimizations due to this pattern, but the whole point of `noundef` is to enable optimizations, and we know that there is code which uses it in a way which is definitely instant UB and which we have declined to inform users of. If possible, we would like to see `freeze` applied to the return value of `mem::uninitialized` as a mitigation for this problem. That may be able to keep old code functioning without introducing a performance hit. @rustbot labels add +T-compiler +I-compiler-nominated

rustbot · 2022-07-06T01:33:59Z

Error: Label add can only be set by Rust team members

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

rust-highfive · 2022-07-06T01:34:00Z

r? @compiler-errors

(rust-highfive has picked a reviewer for you, use r? to override)

erikdesjardins · 2022-07-06T17:09:19Z

functions returning uninitialized references emit IR with is both ret void and also noundef: godbolt.org/z/hbjsKKfc3

define void @example_oof([2 x i8*]* noalias nocapture noundef readnone sret([2 x i8*]) dereferenceable(16) %0) {
  ret void
}

That example doesn't contain UB--noundef applies to the sret pointer itself (which is never undef), not the pointed-to return value. It's okay for the return value to be undef in that function.

This example does contain UB, because of the load (https://godbolt.org/z/P5cn4434T):

pub fn oof() -> &'static u8 {
    let refs: [&'static u8; 2] = unsafe { core::mem::uninitialized() };
    refs[0]
}

define noundef nonnull align 1 dereferenceable(1) i8* @oof() {
  ret i8* undef
}

In the optimized IR, it's UB for 2 reasons:

undef value returned with noundef attribute (since Apply noundef attribute to &T, &mut T, Box<T>, bool #93670)
undef value returned with dereferenceable attribute (since Rust 1.0 https://godbolt.org/z/YYoTxf63j)

(you can play with this in Alive2, e.g. https://alive2.llvm.org/ce/z/rwmt_r -- if the function can be replaced with unreachable, it's unconditional UB)

The original load in the unoptimized IR looks like (https://godbolt.org/z/c3bnPfa97):

%2 = load i8*, i8** %1, align 8, !nonnull !2, !align !3, !noundef !2

This is UB for 1 reason:

undef value loaded with !noundef metadata (since Apply noundef metadata to loads of types that do not permit raw init #94158)

So then the question is--does hyper load any undef references before initializing them?

The usual pattern of:

let foo = mem::uninitialized();
// initialize undef fields
foo.x = 1;
foo.y = 2;
// use foo
// ...

should not have any problematic loads, even if the uninitialized value is returned through multiple other functions first. (~~Provided it's returned indirectly, which is the case for everything > 1 pointer size, i.e. everything except [&Foo; 1] and equivalent types.~~ Edit: actually it's fine either way, since we don't apply such attributes to aggregates, e.g. https://godbolt.org/z/b9vvvcdTT)

If there is a load, doing almost anything with such a loaded reference except storing it (i.e. passing, returning, taking a reference to inner fields, god forbid actually dereferencing, etc.), has been UB since Rust 1.0.

I agree that we should implement mem::uninitialized at runtime with freeze(poison/undef). That will prevent some of this UB at the source.

(Not all of it though--freeze(poison/undef) can be refined to a constant, and LLVM can choose a constant that maximizes UB, e.g. 0 for references, which are nonnull. I think mem::uninitialized basically cannot be salvaged, at least without an "angelic" implementation that always makes a valid value for the type. But that likely has other problems.)

erikdesjardins · 2022-07-06T18:02:21Z

In short:

We don't apply noundef and friends to aggregates* (this includes arrays, even single-element arrays, e.g. https://godbolt.org/z/h4hb8f4M6), so UB only comes into play when you load individual fields of an undef aggregate at a scalar type, e.g. when loading &'static u8 from an undef [&'static u8; 2].

This may be (likely is?) still a concern if libraries are loading such undef fields. But the PR description seems written with the idea that the array creation itself is now UB, which is not the case.

*except those which inherit the ABI from their single field, e.g. https://godbolt.org/z/7e3vs3Wx1, but mem::uninitialized already panics in those cases

saethlin · 2022-07-06T18:05:52Z

It is good to hear this scenario is less dangerous than I feared.

does hyper load any undef references before initializing them?

No, but it creates references to them, which we apply dereferenceable to, which permits speculative loads. But it would be silly for there to ever be a load before they are initialized.

I think mem::uninitialized basically cannot be salvaged

To be clear, my concern is much more the inconsistency in the decision to not turn this pattern into a panic because of the breakage in crater, but to walk this pattern towards being dangerous UB. I do not mind being corrected that I'm just misunderstanding the situation.

compiler-errors · 2022-07-06T18:21:35Z

r? @nikic

nikic · 2022-07-06T18:41:21Z

No, but it creates references to them, which we apply dereferenceable to, which permits speculative loads. But it would be silly for there to ever be a load before they are initialized.

The !noundef metadata is control flow predicated -- it will be dropped if loads are speculated.

If possible, we would like to see freeze applied to the return value of mem::uninitialized as a mitigation for this problem. That may be able to keep old code functioning without introducing a performance hit.

FWIW in practice using freeze in this context is equivalent to using zero initialization. That would effectively be #87032. Haven't reread the discussion there, so not sure why it did not land. (Edit: And to be clear, I would be very much in favor of a change along those lines.)

Don't really have anything more to add here, I think @erikdesjardins already explained very well why the premise of this PR is to the most part incorrect.

saethlin · 2022-07-06T19:10:43Z

#87032 (comment)

I think there's a real risk that this could do more damage than it prevents in the cases that the values are not valid for an all-zeroes bitpattern (which isn't unrealistic at all). In that case the compiler backend will see us definitively initializing the value to an impossible bitpattern, rather than leaving it uninitialized (which, given that many C/C++ variables start out uninitialized, seems unlikely to be exploited in any way).

nikic · 2022-07-06T19:26:13Z

#87032 (comment)

I think there's a real risk that this could do more damage than it prevents in the cases that the values are not valid for an all-zeroes bitpattern (which isn't unrealistic at all). In that case the compiler backend will see us definitively initializing the value to an impossible bitpattern, rather than leaving it uninitialized (which, given that many C/C++ variables start out uninitialized, seems unlikely to be exploited in any way).

I mean, this is possible in the sense that any change can perturb optimization behavior in unexpected ways. But we don't intentionally optimize undef values worse than zero values -- if such cases exist, we would classify that as a missed optimization opportunity and be happy to fix it (fix = exploit more UB in this context).

erikdesjardins · 2022-07-07T06:32:20Z

I think there's a real risk that this could do more damage than it prevents in the cases that the values are not valid for an all-zeroes bitpattern (which isn't unrealistic at all). In that case the compiler backend will see us definitively initializing the value to an impossible bitpattern, rather than leaving it uninitialized (which, given that many C/C++ variables start out uninitialized, seems unlikely to be exploited in any way).

To echo @nikic--this seems to be making an assumption about the way LLVM works that is not really accurate.

Optimizations are generally written as if undef is refined to whatever value allows the most optimization. So any UB that can be exploited for a zero value (or an all-ones value, or...) should be exploited for undef as well.

(As nikic says, it's always possible that an optimization can behave differently for undef and zero. But if that results in less optimization for undef, that's an optimization quality bug that ought to be fixed in LLVM.)

As a trivial example of this: division by zero is UB in LLVM IR. Both 1 / 0 and 1 / undef are optimized to poison: https://godbolt.org/z/K8ahad69v.

RalfJung · 2022-07-11T16:54:16Z

I have certainly seen examples in the past where replacing mem::zeroed by mem::uninitialized made an undesired compilation result go away. So LLVM is not always great at that "make undef maximize UB" thing.

erikdesjardins · 2022-07-11T22:26:44Z

If you have an example I'll fix it upstream ;)

I should've been clearer though. I want to put much more weight on "staying with undef is a bad solution" than "zero in particular is a better solution". (I personally like freeze(undef), but I'll defer to nikic if he thinks it'll end up being the same as zero.)

My point is that upstream philosophy says undef/poison should be optimized at least as much as any constant. If that's not the case, it's an accident, so we should not be depending on a lack of optimizations for undef.

In particular I want to push back against the following:

(which, given that many C/C++ variables start out uninitialized, seems unlikely to be exploited in any way)

It is exploited in many ways! And more are coming with LLVM 15, with improvements to branch-on-undef UB, etc.

RalfJung · 2022-07-11T23:18:11Z

If you have an example I'll fix it upstream ;)

I doubt it still reproduces, but #52898 says the very first example used to SIGILL with zeroed but "work fine" with uninitialized.

nikic · 2022-07-12T07:52:06Z

@RalfJung Thanks, this still reproduces on the IR level: https://llvm.godbolt.org/z/dK9rz73ad There's multiple ways in which this could be optimized here, but I believe the main culprit here is mem2reg: https://llvm.godbolt.org/z/eK9Mdvx3T It places a nonnull assumption for the zero load, but not the undef load. From a brief look at the code, addAssumeNonNull (https://github.com/llvm/llvm-project/blob/f44d28f840c0b0877b09d5547fd09e191bbdc90e/llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp#L301) is not consistently called for replacements with undef values. Should be easy to fix.

nikic · 2022-07-12T10:55:25Z

After looking a bit closer, the generic mem2reg code was already handling this correctly, it's just the optimized single-block implementation that failed to preserve the nonnull assume. Fixed in llvm/llvm-project@3d475df. Thanks for sharing the example!

5225225 · 2022-07-12T11:26:43Z

Does LLVM say it's instant UB to do a noundef load that loads an undef value?

If so, can that be checked in memory sanitizer or similar? msan only seems to check branches on undef, but if llvm is going to use "this cannot be null, it's uninit, therefore it's UB" to optimize on, then that should abort under msan as well.

RalfJung · 2022-07-12T11:41:59Z

@nikic thanks for the quick fix. :)

RalfJung · 2022-07-12T12:02:21Z

This concerns me because in #66151 we have thus far decided against issuing a panic for creating uninitialized references within arrays, on account of the breakage that shows up in crater. If this pattern is so widely-used that we're not willing to put in a runtime check for it, then it doesn't seem prudent to invite LLVM to exploit this UB.

I would propose the following alternative to this PR: for the cases where mem::uninitialized does not panic, we make it fill memory by repeating the byte 0x01. That means at least all bool and nonnull assumptions for the type are satisfied, and of course noundef as well.
That also has the advantage that we make the old code (considered broken by today's standards, and often with a fix available in newer versions of the same crate, but not always semver-compatible) pay the cost, rather than new code that correctly uses MaybeUninit.

nikic · 2022-07-12T12:51:48Z

Does LLVM say it's instant UB to do a noundef load that loads an undef value?

If so, can that be checked in memory sanitizer or similar? msan only seems to check branches on undef, but if llvm is going to use "this cannot be null, it's uninit, therefore it's UB" to optimize on, then that should abort under msan as well.

MSan has an "eager checks" option which checks noundef arguments and return values. Rustc currently doesn't enable this option, but probably should? I believe it also makes msan faster in practice, but can't find the reference for that.

This option currently doesn't handle noundef loads, but could be extended in that direction.

5225225 · 2022-07-12T12:56:36Z

Rustc currently doesn't enable this option, but probably should?

Yeah, I think that'd be great. I did look online for if such a thing existed, but couldn't find anything. I'd be happy to try and help get that enabled by default in rustc (but more discussion of that should probably be in its own issue).

nikic · 2022-07-12T13:01:55Z

Rustc currently doesn't enable this option, but probably should?

Yeah, I think that'd be great. I did look online for if such a thing existed, but couldn't find anything. I'd be happy to try and help get that enabled by default in rustc (but more discussion of that should probably be in its own issue).

I opened #99179 to track.

RalfJung · 2022-07-12T13:43:58Z

I would propose the following alternative to this PR: for the cases where mem::uninitialized does not panic, we make it fill memory by repeating the byte 0x01. That means at least all bool and nonnull assumptions for the type are satisfied, and of course noundef as well.

PR for that is up at #99182.

pnkfelix · 2022-07-21T15:01:13Z

Discussed in T-compiler meeting.

I think the questions raised here should be resolved in the T-lang meeting, so I'm dropping the compiler nomination and replacing it with a T-lang nomination.

(Also, any discussion of the questions posed in this PR should include note of PR #99182 as one (partial, IIUC) mitigation of the problem described here.)

@rustbot label: +I-lang-nominated -I-compiler-nominated

joshtriplett · 2022-07-26T19:03:03Z

We discussed this in today's @rust-lang/lang meeting.

We felt that the proposed mitigation in mem::uninitialized() (#99182) seems like a reasonable step to work around issues.

With that change made, we're likely fine with not doing this revert, assuming other issues don't arise.

However, one non-consensus to explicitly note: there was no consensus to treat this as precedent to reject potential future changes, such as potentially using freeze or otherwise mitigating the use of LLVM's undef. (Concretely, I had objections to treating this as precedent, but don't have a problem with the concrete proposal on the table here.)

mem::uninitialized: mitigate many incorrect uses of this function Alternative to rust-lang#98966: fill memory with `0x01` rather than leaving it uninit. This is definitely bitewise valid for all `bool` and nonnull types, and also those `Option<&T>` that we started putting `noundef` on. However it is still invalid for `char` and some enums, and on references the `dereferenceable` attribute is still violated, so the generated LLVM IR still has UB -- but in fewer cases, and `dereferenceable` is hopefully less likely to cause problems than clearly incorrect range annotations. This can make using `mem::uninitialized` a lot slower, but that function has been deprecated for years and we keep telling everyone to move to `MaybeUninit` because it is basically impossible to use `mem::uninitialized` correctly. For the cases where that hasn't helped (and all the old code out there that nobody will ever update), we can at least mitigate the effect of using this API. Note that this is *not* in any way a stable guarantee -- it is still UB to call `mem::uninitialized::<bool>()`, and Miri will call it out as such. This is somewhat similar to rust-lang#87032, which proposed to make `uninitialized` return a buffer filled with 0x00. However - That PR also proposed to reduce the situations in which we panic, which I don't think we should do at this time. - The 0x01 bit pattern means that nonnull requirements are satisfied, which (due to references) is the most common validity invariant. `@5225225` I hope I am using `cfg(sanitize)` the right way; I was not sure for which ones to test here. Cc rust-lang#66151 Fixes rust-lang#87675

RalfJung · 2022-08-02T12:37:54Z

#99182 has landed, so this one should probably be closed?

saethlin · 2022-08-02T15:23:00Z

👍 Sounds good to me

mem::uninitialized: mitigate many incorrect uses of this function Alternative to rust-lang/rust#98966: fill memory with `0x01` rather than leaving it uninit. This is definitely bitewise valid for all `bool` and nonnull types, and also those `Option<&T>` that we started putting `noundef` on. However it is still invalid for `char` and some enums, and on references the `dereferenceable` attribute is still violated, so the generated LLVM IR still has UB -- but in fewer cases, and `dereferenceable` is hopefully less likely to cause problems than clearly incorrect range annotations. This can make using `mem::uninitialized` a lot slower, but that function has been deprecated for years and we keep telling everyone to move to `MaybeUninit` because it is basically impossible to use `mem::uninitialized` correctly. For the cases where that hasn't helped (and all the old code out there that nobody will ever update), we can at least mitigate the effect of using this API. Note that this is *not* in any way a stable guarantee -- it is still UB to call `mem::uninitialized::<bool>()`, and Miri will call it out as such. This is somewhat similar to rust-lang/rust#87032, which proposed to make `uninitialized` return a buffer filled with 0x00. However - That PR also proposed to reduce the situations in which we panic, which I don't think we should do at this time. - The 0x01 bit pattern means that nonnull requirements are satisfied, which (due to references) is the most common validity invariant. `@5225225` I hope I am using `cfg(sanitize)` the right way; I was not sure for which ones to test here. Cc rust-lang/rust#66151 Fixes rust-lang/rust#87675

rust-highfive assigned compiler-errors Jul 6, 2022

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jul 6, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 6, 2022

compiler-errors added the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Jul 6, 2022

rust-highfive assigned nikic and unassigned compiler-errors Jul 6, 2022

RalfJung mentioned this pull request Jul 12, 2022

mem::uninitialized: mitigate many incorrect uses of this function #99182

Merged

rustbot added I-lang-nominated Nominated for discussion during a lang team meeting. and removed I-compiler-nominated Nominated for discussion during a compiler team meeting. labels Jul 21, 2022

saethlin closed this Aug 2, 2022

saethlin deleted the revert-noundef branch September 21, 2022 03:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert #94158, "Apply noundef metadata to loads of types that do not permit raw init" #98966

Revert #94158, "Apply noundef metadata to loads of types that do not permit raw init" #98966

saethlin commented Jul 6, 2022

rustbot commented Jul 6, 2022

rust-highfive commented Jul 6, 2022

erikdesjardins commented Jul 6, 2022 •

edited

Loading

erikdesjardins commented Jul 6, 2022 •

edited

Loading

saethlin commented Jul 6, 2022

compiler-errors commented Jul 6, 2022

nikic commented Jul 6, 2022 •

edited

Loading

saethlin commented Jul 6, 2022

nikic commented Jul 6, 2022

erikdesjardins commented Jul 7, 2022

RalfJung commented Jul 11, 2022

erikdesjardins commented Jul 11, 2022

RalfJung commented Jul 11, 2022

nikic commented Jul 12, 2022

nikic commented Jul 12, 2022

5225225 commented Jul 12, 2022

RalfJung commented Jul 12, 2022

RalfJung commented Jul 12, 2022

nikic commented Jul 12, 2022

5225225 commented Jul 12, 2022

nikic commented Jul 12, 2022

RalfJung commented Jul 12, 2022

pnkfelix commented Jul 21, 2022 •

edited

Loading

joshtriplett commented Jul 26, 2022 •

edited by pnkfelix

Loading

RalfJung commented Aug 2, 2022

saethlin commented Aug 2, 2022

Revert #94158, "Apply noundef metadata to loads of types that do not permit raw init" #98966

Revert #94158, "Apply noundef metadata to loads of types that do not permit raw init" #98966

Conversation

saethlin commented Jul 6, 2022

rustbot commented Jul 6, 2022

rust-highfive commented Jul 6, 2022

erikdesjardins commented Jul 6, 2022 • edited Loading

erikdesjardins commented Jul 6, 2022 • edited Loading

saethlin commented Jul 6, 2022

compiler-errors commented Jul 6, 2022

nikic commented Jul 6, 2022 • edited Loading

saethlin commented Jul 6, 2022

nikic commented Jul 6, 2022

erikdesjardins commented Jul 7, 2022

RalfJung commented Jul 11, 2022

erikdesjardins commented Jul 11, 2022

RalfJung commented Jul 11, 2022

nikic commented Jul 12, 2022

nikic commented Jul 12, 2022

5225225 commented Jul 12, 2022

RalfJung commented Jul 12, 2022

RalfJung commented Jul 12, 2022

nikic commented Jul 12, 2022

5225225 commented Jul 12, 2022

nikic commented Jul 12, 2022

RalfJung commented Jul 12, 2022

pnkfelix commented Jul 21, 2022 • edited Loading

joshtriplett commented Jul 26, 2022 • edited by pnkfelix Loading

RalfJung commented Aug 2, 2022

saethlin commented Aug 2, 2022

erikdesjardins commented Jul 6, 2022 •

edited

Loading

erikdesjardins commented Jul 6, 2022 •

edited

Loading

nikic commented Jul 6, 2022 •

edited

Loading

pnkfelix commented Jul 21, 2022 •

edited

Loading

joshtriplett commented Jul 26, 2022 •

edited by pnkfelix

Loading