Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::ptr::write_bytes should clearly explain safety and validity invariants #84184

Open
lokegustafsson opened this issue Apr 14, 2021 · 10 comments
Labels
A-raw-pointers Area: raw pointers, MaybeUninit, NonNull C-bug Category: This is a bug. E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.

Comments

@lokegustafsson
Copy link

This is an example code snippet for std::ptr::write_bytes:

use std::ptr;

let mut v = Box::new(0i32);

unsafe {
    // Leaks the previously held value by overwriting the `Box<T>` with
    // a null pointer.
    ptr::write_bytes(&mut v as *mut Box<i32>, 0, 1);
}

// At this point, using or dropping `v` results in undefined behavior.
// drop(v); // ERROR

// Even leaking `v` "uses" it, and hence is undefined behavior.
// mem::forget(v); // ERROR

// In fact, `v` is invalid according to basic type layout invariants, so *any*
// operation touching it is undefined behavior.
// let v2 = v; // ERROR

unsafe {
    // Let us instead put in a valid value
    ptr::write(&mut v as *mut Box<i32>, Box::new(42i32));
}

// Now the box is fine
assert_eq!(*v, 42);

This example writes a null pointer to a box, which (verbatim!) "is invalid according to basic type layout invariants". Then it incorrectly states that this is fine as long as we do not "touch" the box, while really this is already UB. This looks like a documentation bug to me.

As a side note: why is Unique<T> not intended to be stabilized? The strong aliasing guarantees could be useful in someone's unsafe code.

@lokegustafsson lokegustafsson added the C-bug Category: This is a bug. label Apr 14, 2021
@workingjubilee
Copy link
Member

From the Reference:

Behavior Considered Undefined

  • Producing an invalid value, even in private fields and locals. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. The following values are invalid (at their respective type):

This does indeed seem to be canonical UB! The documentation should either explain that this first unsafe is UB, and then omit the other misleading remarks, or omit that first unsafe example and the consequent comments entirely.

@workingjubilee workingjubilee added E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. A-raw-pointers Area: raw pointers, MaybeUninit, NonNull labels Apr 17, 2021
@asquared31415
Copy link
Contributor

The docs on write_bytes even say

Additionally, the caller must ensure that writing count * size_of::<T>() bytes to the given region of memory results in a valid value of T.

However the example after proceeds to do exactly that and is breaking both the safety invariants of the function and causes UB.

@RalfJung
Copy link
Member

I think this code is technically fine under our current UB rules since the "bad" Box is never actually "created" as a Box -- all this code does is write some bytes to a location in memory that happens to store a Box, but before anyone even does a move of that Box, it fixes things up.

The following would be definite UB:

use std::ptr;

let mut v = Box::new(0i32);

unsafe {
    // Leaks the previously held value by overwriting the `Box<T>` with
    // a null pointer.
    ptr::write_bytes(&mut v as *mut Box<i32>, 0, 1);
}

let v2 = v; // UB! We are doing a typed copy of a `Box` that does not satisfy its validity invariant.

This is an instance of rust-lang/unsafe-code-guidelines#84.

@asquared31415
Copy link
Contributor

I suppose that's true, but even if the example isn't UB it breaks the safety requirements of the function and probably is not a good example.

@workingjubilee
Copy link
Member

🤔 I see, I misunderstood. It seems that describing the situation more accurately in the reference is also desirable, though that should perhaps be logged as an issue against the reference. Roughly it seems like the actual UB is "Permitting an invalid value to be observed, or interacting with a memory location that contains an invalid value in any way other than assigning a new value to it". The reference already mentions it is non-exhaustive (and also non-normative) so it's fine if this doesn't include all the possible cases.

It seems these kinds of considerations about "validity invariants" and "safety invariants" should be discussed, briefly, at the top of std::ptr's documentation also, and not necessarily just in each function? I realize these kinds of rules aren't finalized but this kind of multi-level thinking is necessary when interacting with highly unsafe code, and the current docs include a relatively unnuanced notion of "validity": https://doc.rust-lang.org/nightly/std/ptr/index.html#safety

@RalfJung
Copy link
Member

It seems these kinds of considerations about "validity invariants" and "safety invariants" should be discussed, briefly, at the top of std::ptr's documentation also, and not necessarily just in each function? I realize these kinds of rules aren't finalized but this kind of multi-level thinking is necessary when interacting with highly unsafe code, and the current docs include a relatively unnuanced notion of "validity": https://doc.rust-lang.org/nightly/std/ptr/index.html#safety

That "validity" is actually orthogonal to the one in that reference page... the ptr module talks talk about "pointer validity" which defines when a (raw) pointer is allowed to read/write some memory.

But yes, this issue of validity invariants is cross-cutting. Another good place to put this (and to link to from everywhere else) could be in the MaybeUninit docs.

It seems that describing the situation more accurately in the reference is also desirable, though that should perhaps be logged as an issue against the reference. Roughly it seems like the actual UB is "Permitting an invalid value to be observed, or interacting with a memory location that contains an invalid value in any way other than assigning a new value to it". The reference already mentions it is non-exhaustive (and also non-normative) so it's fine if this doesn't include all the possible cases.

We should probably use some more-or-less arbitrary but specific term ("construct" seems okay to me) but then have a dedicated paragraph defining what exactly that means.

@workingjubilee workingjubilee changed the title std::ptr::write_bytes documentation example is UB std::ptr::write_bytes should clearly explain safety and validity invariants Apr 22, 2021
@workingjubilee workingjubilee added E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. and removed E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. labels Apr 22, 2021
@saethlin
Copy link
Member

I just came upon this independently because I had reason to be reading over the docs for write_bytes. From reading the above-linked rust-lang/unsafe-code-guidelines#84 it sounds like breaking then re-establishing invariants is a useful or at least widely-used pattern.

I don't understand what exactly "safety and validity invariants" mentioned in the title of this issue are, but I would prefer to see the docs here touched up a little bit. As a non-expert in the UCG I do not understand how &mut v doesn't count as a use but drop(v) does.

Therefore, I think it would be a significant improvement to the docs if they acknowledged that this pattern is subtle but valid. The comment // Let us instead put in a valid value seems too casual to me, and too much like the sort of comment I've seen over an invalid unsafe block. Maybe something like this would be better?

/// unsafe {
///     // This `&mut v` does not count as a use of `v`, but lets us write a valid value over the invalid `Box`
///     ptr::write(&mut v as *mut Box<i32>, Box::new(42i32));
/// }
///
/// // Because we've re-established the invariants of the Box before using it, we can now do whatever we want in safe code.
/// assert_eq!(*v, 42);

@RalfJung
Copy link
Member

acknowledged that this pattern is subtle but valid

I think it should be valid, but I don't think we have an RFC or a lang-team decision saying that it is valid. So we can't really recommend using it in the docs currently.

@saethlin
Copy link
Member

So we can't really recommend using it in the docs currently.

But there's an example in the docs that uses this pattern, isn't that equivalent to a recommendation? Should this pattern be eliminated from the docs for now?

@RalfJung
Copy link
Member

Hm, good point...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-raw-pointers Area: raw pointers, MaybeUninit, NonNull C-bug Category: This is a bug. E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.
Projects
None yet
Development

No branches or pull requests

6 participants
@RalfJung @saethlin @lokegustafsson @asquared31415 @workingjubilee and others