Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

causes UB by calling mem::zeroed on any type #2

Closed
RalfJung opened this issue Jan 23, 2019 · 17 comments · Fixed by #3
Closed

causes UB by calling mem::zeroed on any type #2

RalfJung opened this issue Jan 23, 2019 · 17 comments · Fixed by #3

Comments

@RalfJung
Copy link
Collaborator

This code is highly problematic:

    pub unsafe fn new<F: for<'a> FnOnce(&'a T) -> &'a U>(f: F) -> Self {
        // Construct a "fake" T. It's not valid, but the lambda shouldn't
        // actually access it (which is why this is unsafe)
        let x = mem::zeroed();

If T is, for example, a struct containing reference type, then this causes UB by initializing a reference with 0. The comment seems to say that it is okay to do this as long as x is not accessed; that is not correct.

That said, there currently is no correct way to do what you want. This bug can only really be fixed once MaybeUninit gets stabilized.

(I also feel really uneasy with the extend to which this relies on type inference, but YMMV.^^)

@Diggsey
Copy link
Owner

Diggsey commented Jan 23, 2019

Well... my excuse is that the rust unsafe code guidelines didn't exist when the crate was made, and this would have been valid in C/C++ 😛 So uh.. thanks!

Does MaybeUninit actually help? The lambda still needs to access (the address of) a field on the uninitialised value in the union.

Does the UB come from owning the uninitialised value, or does it come from having a live expression of type T, where T may be uninhabited, or something else?

It seems like an intrinsic maybe needed or there will still be no way to do this without invoking UB?

@RalfJung
Copy link
Collaborator Author

You might be interested in this discussion on internals that brought me here. :)

Does the UB come from owning the uninitialised value, or does it come from having a live expression of type T, where T may be uninhabited, or something else?

UB comes from operating on (in this case, assigning) a value of a type T that does not satisfy the validity invariant of that type.

Does MaybeUninit actually help? The lambda still needs to access (the address of) a field on the uninitialised value in the union.

You are fine as long as you avoid references. This is where you also need rust-lang/rfcs#2582.

@Diggsey
Copy link
Owner

Diggsey commented Jan 23, 2019

@RalfJung I see, the only bit that seems kindof a grey area for me is the fields access, eg.

&pointer_to_uninit.field

Coming from C++, my intuition for how lvalues work is that they are essentially the same as implicit references. eg.

auto x = *ptr + 1;

This is actually taking the rvalue ptr, the * operator converts it into an lvalue with the address of ptr, and then it's only when that lvalue is implicitly converted to an rvalue in order to perform the addition, that it actually gets dereferenced.

And this explains why the following does not actually read the memory referred to by ptr:

auto y = &(*ptr).foo;

Again ptr is converted to an lvalue with the address of ptr, then it does a field access (which for lvalues is the same as adding an offset to their address) and then we use &, which (for lvalues) is a no-op that converts them back into a normal pointer. Because the lvalue is never implicitly converted to an rvalue, the memory is never actually dereferenced, even though the dereference operator is used.

In this model, the rules for uninit data are easy: you are just not allowed to have uninit data in an rvalue at any time.

Now, going back to rust-land: if I try to find corresponding rules, it's much trickier, because now it matters whether an lvalue was derived from a reference vs a raw pointer or union access: let's call it raw_lvalue vs lvalue.

Now the rule must be that neither lvalues nor rvalues can contain uninit data, but raw_lvalues can (or else the field offset in &raw uninit.value.field would be invalid.

Without these rules, it's difficult to specify what kinds of operations are allowed to operate on uninit data: eg. you say that assignment cannot operate on uninit data, and you can derive that by saying that assignment operates on lvalues (and so raw_lvalues must first be converted to lvalues) whereas field access is defined for all three of raw_lvalues, lvalues and rvalues.

@RalfJung
Copy link
Collaborator Author

This is actually taking the rvalue ptr, the * operator converts it into an lvalue with the address of ptr, and then it's only when that lvalue is implicitly converted to an rvalue in order to perform the addition, that it actually gets dereferenced.

This matches Rust.

if I try to find corresponding rules, it's much trickier, because now it matters whether an lvalue was derived from a reference vs a raw pointer or union access: let's call it raw_lvalue vs lvalue.

Why do you think unions matter? I don't think they do. Actually I think the model should be uniform for all places (that's how lvalues are called in Rust), and the only thing it should assert is that the memory this points to is allocated (with the size given by the type). But so far that is just my model, not something we all agreed on yet.

Only rvalues (for which we are still searching a better term in Rust land) have to be valid at their given type.

@Diggsey
Copy link
Owner

Diggsey commented Jan 23, 2019

OK, I think I understand. If I create a (normal) reference to an uninit lvalue:

let x = &maybe_uninit.value;

The uninitialised value is never in an rvalue, and so it's fine for it to be uninitialised. However, a reference to it is created which is an rvalue, and references must point to valid data to themselves be valid, and that's the source of the UB.

If I could somehow create a reference such that the reference itself was an lvalue, then that reference could point to uninitialised data, without invoking UB, but that's not possible because & always returns an rvalue.

As for naming rvalues, I think it would be tricky to improve on rvalue given how obscure the concept is, because there's never a good name for obscure concepts, and sometimes just using any existing precedent is better. Maybe "materialized", "actualized" or "reified" value.

@RalfJung
Copy link
Collaborator Author

RalfJung commented Jan 23, 2019

However, a reference to it is created which is an rvalue, and references must point to valid data to themselves be valid, and that's the source of the UB.

Exactly. Except that what exactly validity of references is is not yet finalized, we are collecting data and going to be discussing that in rust-lang/unsafe-code-guidelines#77.

In my reading references have to point to allocated memory but it is okay if that memory is not initialized. But there might be reasons to require the stronger invariant.

If I could somehow create a reference such that the reference itself was an lvalue, then that reference could point to uninitialised data, without invoking UB, but that's not possible because & always returns an rvalue.

Not sure what you mean here, but the point of rust-lang/rfcs#2582 is that you can turn a place immediately into an rvalue of type *mut T, and that does not have any requirements about memory being initialized.

@RalfJung RalfJung changed the title calls mem::zeroed on any type causes UB by calling mem::zeroed on any type Jul 7, 2019
@RalfJung
Copy link
Collaborator Author

Reading through this again, I don't know why we talked about field access here. The UB is already in this line:

        // Construct a "fake" T. It's not valid, but the lambda shouldn't
        // actually access it (which is why this is unsafe)
        let x = mem::zeroed();

This "produces" a value from 0-filled memory for any type T, which has been documented as UB in the Nomicon for quite a while.

Do you think there is any chance this could use the memoffset crate? We spent quite some time getting the offset_of! in that crate to be as safe as it can be for the versions of Rust it supports (which is Rust 1.25+).

@Diggsey
Copy link
Owner

Diggsey commented Aug 10, 2019

I don't believe that version of the offset_of! macro is sufficiently safe for what this crate requires. The way it gets the address of a field may invoke Deref and result in an invalid field offset.

The way I get the address in this crate is by using pattern matching, so that offset_of! can only return valid field offsets, which can then be encapsulated in the safe FieldOffset type.

Unfortunately, there is no equivalent of ref for raw pointers when pattern matching.

@RalfJung
Copy link
Collaborator Author

RalfJung commented Aug 10, 2019

I don't believe that version of the offset_of! macro is sufficiently safe for what this crate requires. The way it gets the address of a field may invoke Deref and result in an invalid field offset.

No, it specifically checks for that case first, before doing the field access. We went for that approach to make it easier to convert to rust-lang/rfcs#2582 once that lands, and to keep the references to uninitialized memory that we create to a minimum.

@RalfJung
Copy link
Collaborator Author

RalfJung commented Aug 26, 2019

@Diggsey did you have a chance to take a second look? I am fairly sure that your analysis above is incorrect and that memoffset successfully protects from Deref.

The use of mem::zeroed here means that code will likely cause SIGILL when this is used with a type that must not be null. Unlike the currently unavoidable UB that is still left in memoffset, the approach taken here is known to cause "miscompilations" (as in, things not being compiled the way the user expects -- there is no compiler bug). In particular, this crate will trigger the new lint warning about invalid values.

@Diggsey
Copy link
Owner

Diggsey commented Aug 26, 2019

Yes, I don't have any concerns with the new implementation now, but was considering whether I should make this a breaking change:

The existing (unsafe) FieldOffset::new method cannot be used safely with all types as it uses references rather than pointers.

The current documentation says that when using this method, you should not access the referenced value, but this is no longer sufficient: if I keep the method as-is, it will be invalid to call on any type which cannot be zeroed, so I would need to update the docs and add a second constructor without that constraint. However, it looks like in that case the new lint will still fire? (Shouldn't the lint be disabled for generic parameters if the method is unsafe, because it could simply be part of the contract that T must be zero-able?)

Anyway, I'm leaning towards just making it a breaking change to avoid any surprises, so this shouldn't be an issue.

@Diggsey
Copy link
Owner

Diggsey commented Aug 26, 2019

I've opened a PR: #3

Would you mind reviewing it?

@RalfJung
Copy link
Collaborator Author

RalfJung commented Aug 26, 2019

The lint only fires if the type is definitely not zero-able... so yeah I guess it will not apply here. I thought that code was in a macro, but it is not.

Anyway, I'm leaning towards just making it a breaking change to avoid any surprises, so this shouldn't be an issue.

Ah, I didn't realize this would have to be a breaking change.

Given the unsoundness of the old version, I wonder if it is worth either having a patch release that's still not sound but better (it could use a dangling reference instead of zeroed, similar to what memoffset does for pre-MaybeUninit versions of rustc), or else yanking the old version?

Would you mind reviewing it?

Thanks! I'll take a look.

@RalfJung
Copy link
Collaborator Author

Not sure why I need owner access for the review but thanks anyway. ;)
I will not have time for helping out with maintenance though, I am afraid.

@Diggsey
Copy link
Owner

Diggsey commented Aug 26, 2019

Github wouldn't let me add you as a reviewer without first adding you as a contributor. Don't worry I'm not expecting you to help maintain 😛

@Diggsey
Copy link
Owner

Diggsey commented Aug 26, 2019

Btw, I did find this line to be a little... dangerous: https://github.com/Gilnaa/memoffset/blob/master/src/offset_of.rs#L33
(The fact that it relies on expanding into the same scope)

Maybe it's just personal taste, but I prefer to put as much of the unsafety outside of macros as possible, because in a macro, you might rely on something which today would be a syntax error, but in tomorrow's rust might be valid code and do something totally unexpected.

@RalfJung
Copy link
Collaborator Author

I can't really imagine anything that would break that, do you have an example?

Also this helps factor the code, which is important when dealing with unsafe. I'd rather not repeat myself.

@Diggsey Diggsey closed this as completed in #3 Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants