Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for maybe_uninit_as_bytes #93092

Open
1 of 4 tasks
Amanieu opened this issue Jan 19, 2022 · 9 comments
Open
1 of 4 tasks

Tracking Issue for maybe_uninit_as_bytes #93092

Amanieu opened this issue Jan 19, 2022 · 9 comments
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Amanieu
Copy link
Member

Amanieu commented Jan 19, 2022

Feature gate: #![feature(maybe_uninit_as_bytes)

This is a tracking issue for APIs which allow access to individual bytes of a MaybeUninit<T>. This is always safe even for padding bytes since the bytes are themselves represented as MaybeUninit<u8>.

Public API

// core::mem

impl<T> MaybeUninit<T> {
    pub fn as_bytes(&self) -> &[MaybeUninit<u8>];
    pub fn as_bytes_mut(&mut self) -> &mut [MaybeUninit<u8>];
}

impl<T> [MaybeUninit<T>] {
    pub const fn as_bytes(&self) -> &[MaybeUninit<u8>];
    pub const fn as_bytes_mut(&mut self) -> &mut [MaybeUninit<u8>];
}

Steps / History

Unresolved Questions

  • None yet.
@Amanieu Amanieu added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Jan 19, 2022
@CryZe
Copy link
Contributor

CryZe commented Jan 20, 2022

Would it maybe make sense for these to be arrays of size_of::<T>() elements instead? (I don't recall if this is allowed in stable Rust yet)

@5225225
Copy link
Contributor

5225225 commented Jan 20, 2022

The only thing I can think of that can possibly be an issue is making a type with padding into a &mut [MaybeUninit<u8>], and then calling .write() on a padding byte.

Like

use std::mem::MaybeUninit;

#[repr(C)]
#[derive(Default)]
struct Foo {
    a: u16,
    b: u8
}

fn main() {
    let mut f = Foo::default();
    let fmur = &mut f as *mut Foo as *mut MaybeUninit<u8>;
    unsafe {
        dbg!(*std::slice::from_raw_parts_mut(fmur, 4)[3].write(42));
    }
}

This is presumably sound because padding bytes are only made uninit when you write a field of type Foo, not just "it came from a Foo and is therefore always uninit".

But also I'd need to have a think about how that interacts with interior mutability. Presumably sound, in any case.

@RalfJung
Copy link
Member

In principle we could have such methods for any type U, not just u8 (after adjusting the size arithmetic accordingly). But I guess there is no good reason to have those?

@5225225
Copy link
Contributor

5225225 commented Jan 27, 2022

If you allow casting to a type U, then you need to solve the problem of

  • What if U is a ZST?
  • What if there's not a multiple of size_of(U) bytes in the MaybeUninit?
  • What if the align of U is more than T, and the slice is misaligned for a U?

These could be solved by panicking or returning an error, but none of these are problems when U is size 1 align 1.

@kupiakos
Copy link
Contributor

kupiakos commented Apr 14, 2022

Should as_bytes_mut be instead named as_mut_bytes as recommended by the API guidelines? Same with slice_as_bytes_mut/slice_as_mut_bytes.

If the mut qualifier in the name of a conversion method constitutes part of the return type, it should appear as it would appear in the type. For example Vec::as_mut_slice returns a mut slice; it does what it says. This name is preferred over as_slice_mut.

I recognize the stdlib is not consistent on this. See str::as_bytes_mut, which also outputs a &mut [u8] instead of a &mut [MaybeUninit<u8>] while being called bytes. If the _mut_X order really is preferred, we should stick with it for future APIs.

@ais523
Copy link

ais523 commented Jun 12, 2024

The current implementation of as_bytes (and slice_as_bytes) is unsound in the sense that it allows undefined behaviour in unsafe code that obeys all the stated safety guidelines:

#![feature(maybe_uninit_as_bytes)]
use std::mem::MaybeUninit;
use std::cell::Cell;

fn unsoundness_demo(a: &u8, b: &Cell<u8>) {
    println!("{}", a);
    b.set(2);
    println!("{}", a);
}

fn main() {
    unsafe {
        let mut u = MaybeUninit::uninit();
        u.write(Cell::new(1));
        unsoundness_demo(u.as_bytes()[0].assume_init_ref(), u.assume_init_ref());
    }
}

Playground

unsoundness_demo's first argument a is a shared reference without interior mutability, yet the thing it pointed to was modified, which is undefined behaviour. However, the only unsafe method called here is assume_init_ref, and the code upholds the stated safety guideline for that method (that the content is fully initialised – every byte of both the MaybeUninit<Cell<u8>> and the MaybeUninit<u8> is set to a known value that is legal for the data type).

This problem could be fixed either via adding a new safety requirement on assume_init_ref and friends that the MaybeUninit<T> has not been converted from a MaybeUninit<U> where types T and U differ in interior mutability (perhaps by considering a byte to be uninitialised if copied from a type with different interior mutability for that byte), or via banning the use of as_bytes if either the source or destination type has interior mutability.


@RalfJung wrote:

In principle we could have such methods for any type U, not just u8 (after adjusting the size arithmetic accordingly). But I guess there is no good reason to have those?

There is actually an important use case for being able to do the equivalent of as_bytes_mut between arbitrary MaybeUninit types. A common pattern in C is to write a function in which the caller of a function allocates storage for the result of the function call, and the function writes its result to that storage (e.g. this is used by the C standard library function snprintf). In Rust, it is possible to do this in safe code by having the function return a reference back to the same storage (i.e. the function takes a &'a mut MaybeUninit<T> as its argument, initialises it using write and returns the &mut T that was returned by write as its own return value).

Although this works well if you know the return type already, there are cases where you'd want to do this sort of thing but don't know the type in advance. For example, I've been recently looking into "local allocators" in which allocations are made one at a time on-demand, but where deallocations are not allowed while the local allocator is alive (rather, dropping the local allocator drops all the allocations at once). This is a very efficient pattern which is commonly used in game development. In general the applications of this sort of allocator are fairly similar to the applications of Rc (you can even use it to create structures that reference cycles by using it together with Cell, just like you can with Rc), except that the deallocations happen all at once rather than when no references remain – but you get a number of advantages in return (no risk of accidentally leaking memory due to a reference cycle, substantially better performance due to not needing a separate allocation for every Rc and not needing to store or update reference counts, and the references are Copy whereas an Rc can't be embedded in a type that is Copy), and one disadvantage (you can't soundly run the destructors of the types you allocated) which doesn't matter in most of the cases where you'd want to use this. Via using MaybeUninit::write(), it is possible to implement this entirely in safe code.

If there were a safe transmute_mut for MaybeUninit that allowed transmutation between &mut MaybeUninit<T> and &mut MaybeUninit<U> whenever the source reference was sufficiently large, and sufficiently aligned, to be read as a reference to the destination type, then this sort of allocator would be much more powerful, as it could allocate data of more than one type. (Presently, it needs to initially allocate its data storage as a [MaybeUninit<T>], meaning that it can only allocate Ts. With a safe transmute_mut, it could alternatively allocate its data storage as a [MaybeUninit<u8>] and transmute slices of that into the appropriate type in order to do allocations.) Although this is currently possible to work around by using a different data storage for each possible T, this precludes the situation where data of a type with a lifetime is allocated in a loop (with a different lifetime each time round the loop, meaning that T is a slightly different type each time). As far as I can tell, transmuting an &'a mut MaybeUninit<T> into an &'a mut MaybeUninit<U> is safe whenever the reference has sufficient size and alignment, and would make this sort of allocator more powerful because it would no longer be resticted to allocating finitely many types. (Doing this soundly on shared references also requires the interior mutabilities to match, and I also can't think of a use case for it in safe Rust – it might have uses in unsafe Rust.) So a good reason to have an arbitrary transmute_mut for MaybeUninit would be that it allows an important memory allocation primitive to be written entirely in safe Rust.

@RalfJung
Copy link
Member

This problem could be fixed either via adding a new safety requirement on assume_init_ref and friends that the MaybeUninit has not been converted from a MaybeUninit where types T and U differ in interior mutability (perhaps by considering a byte to be uninitialised if copied from a type with different interior mutability for that byte), or via banning the use of as_bytes if either the source or destination type has interior mutability.

IMO the fix is to say that for the duration that the reference returned by as_bytes is life, no (interior) mutation of the underlying memory may happen.

@wyfo
Copy link

wyfo commented Aug 17, 2024

What about pointer provenance? Converting &mut MaybeUninit<*const T> into &mut [MaybeUninit<u8>] looks a lot like a pointer-usize-pointer roundtrip, forbidden under strict provenance rules.

@ais523
Copy link

ais523 commented Aug 17, 2024

Although that conversion could be used to do a pointer-usize-pointer roundtrip, it doesn't actually violate any rules until you try to treat the resulting pointer as initialized.

A much simpler example of the same sort of thing is to convert an &mut MaybeUninit<bool> into &mut [MaybeUninit<u8>; 1], and write 2 to the memory through the new reference. You end up with an &mut MaybeUninit<bool> pointing to memory holding the value 2, which is not valid for a bool. But that doesn't matter, because there's no requirement that uninitialized memory has a valid value for the type.

In this case, the soundness requirement would be the usual "you musn't assume a MaybeUninit has been initialised unless it actually has a valid value for the data type". Under strict provenance, pointer values constructed without going through provenance APIs aren't valid – they can be held in a MaybeUnint<*const T> because you can hold any random bit pattern in those, but can't be converted into a *const T without writing a valid value to the memory first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants