Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

From<&[T]> for Rc creates underaligned reference #54908

Closed
Tracked by #10
RalfJung opened this issue Oct 8, 2018 · 22 comments
Closed
Tracked by #10

From<&[T]> for Rc creates underaligned reference #54908

RalfJung opened this issue Oct 8, 2018 · 22 comments
Labels
I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-medium Medium priority T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented Oct 8, 2018

The function allocate_for_ptr is passed a *const [T], which it turns into a &RcBox<[T]>. Unfortunately, the latter has an alignment of 8 even if T has smaller alignment. This leads to UB because we have a not-sufficiently-aligned reference.

This got introduced in #42565.

Found by miri (in my branch that verified the validity invariants).

Cc @murarth @Centril @aturon

@Centril Centril added P-high High priority T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Oct 8, 2018
@RalfJung
Copy link
Member Author

RalfJung commented Oct 8, 2018

This also affects Rc::from_box and the From<&str>, and likely for instances around converting things into Rc. And Arc has the same problem.

IMO the best fix is to make Layout::for_value take a raw pointer instead. Don't they even coerce from references so this is "almost" backwards-compatible (except inference)?

@RalfJung
Copy link
Member Author

RalfJung commented Oct 8, 2018

Oh wow this code is crazy. The "data field" it talks about is not some field of a struct, it's the pointer part of a fat pointer. That could have been made more clear...^^

@Centril
Copy link
Contributor

Centril commented Oct 8, 2018

cc @rust-lang/lang and @rust-lang/libs

@SimonSapin
Copy link
Contributor

"Almost backwards-compatible" is probably not good enough :) But we can add Layout::for_ptr. Either way, this function calls mem::size_of_val and mem::align_of_val so we’d need raw-pointer variants of those too. And figure out their safety. Can raw trait objects be assumed to always have a valid vtable pointer?

@RalfJung
Copy link
Member Author

RalfJung commented Oct 8, 2018

Can raw trait objects be assumed to always have a valid vtable pointer?

Good question. I am going to lobby for "yes", but no decision has been made yet.

@Centril
Copy link
Contributor

Centril commented Oct 8, 2018

The relevant code in one place (not including Arc):

impl From<String> for Rc<str> {
    #[inline]
    fn from(v: String) -> Rc<str> {
        Rc::from(&v[..])
    }
}

impl<'a, T: Clone> From<&'a [T]> for Rc<[T]> {
    #[inline]
    fn from(v: &[T]) -> Rc<[T]> {
        <Self as RcFromSlice<T>>::from_slice(v)
    }
}

impl<T: ?Sized> From<Box<T>> for Rc<T> {
    #[inline]
    fn from(v: Box<T>) -> Rc<T> {
        Rc::from_box(v)
    }
}


struct RcBox<T: ?Sized> {
    strong: Cell<usize>,
    weak: Cell<usize>,
    value: T,
}

impl<'a> From<&'a str> for Rc<str> {
    #[inline]
    fn from(v: &str) -> Rc<str> {
        let rc = Rc::<[u8]>::from(v.as_bytes());
        unsafe { Rc::from_raw(Rc::into_raw(rc) as *const str) }
    }
}

impl<T> From<Vec<T>> for Rc<[T]> {
    #[inline]
    fn from(mut v: Vec<T>) -> Rc<[T]> {
        unsafe {
            let rc = Rc::copy_from_slice(&v);

            // Allow the Vec to free its memory, but not destroy its contents
            v.set_len(0);

            rc
        }
    }
}

pub struct Rc<T: ?Sized> {
    ptr: NonNull<RcBox<T>>,
    phantom: PhantomData<T>,
}

trait RcFromSlice<T> {
    fn from_slice(slice: &[T]) -> Self;
}

impl<T: Clone> RcFromSlice<T> for Rc<[T]> {
    #[inline]
    default fn from_slice(v: &[T]) -> Self {
        // Panic guard while cloning T elements.
        // In the event of a panic, elements that have been written
        // into the new RcBox will be dropped, then the memory freed.
        struct Guard<T> {
            mem: NonNull<u8>,
            elems: *mut T,
            layout: Layout,
            n_elems: usize,
        }

        impl<T> Drop for Guard<T> {
            fn drop(&mut self) {
                use core::slice::from_raw_parts_mut;

                unsafe {
                    let slice = from_raw_parts_mut(self.elems, self.n_elems);
                    ptr::drop_in_place(slice);

                    Global.dealloc(self.mem, self.layout.clone());
                }
            }
        }

        unsafe {
            let v_ptr = v as *const [T];
            let ptr = Self::allocate_for_ptr(v_ptr);

            let mem = ptr as *mut _ as *mut u8;
            let layout = Layout::for_value(&*ptr);

            // Pointer to first element
            let elems = &mut (*ptr).value as *mut [T] as *mut T;

            let mut guard = Guard{
                mem: NonNull::new_unchecked(mem),
                elems: elems,
                layout: layout,
                n_elems: 0,
            };

            for (i, item) in v.iter().enumerate() {
                ptr::write(elems.add(i), item.clone());
                guard.n_elems += 1;
            }

            // All clear. Forget the guard so it doesn't free the new RcBox.
            forget(guard);

            Rc { ptr: NonNull::new_unchecked(ptr), phantom: PhantomData }
        }
    }
}

impl<T: Copy> RcFromSlice<T> for Rc<[T]> {
    #[inline]
    fn from_slice(v: &[T]) -> Self {
        unsafe { Rc::copy_from_slice(v) }
    }
}

impl<T> Rc<[T]> {
    // Copy elements from slice into newly allocated Rc<[T]>
    //
    // Unsafe because the caller must either take ownership or bind `T: Copy`
    unsafe fn copy_from_slice(v: &[T]) -> Rc<[T]> {
        let v_ptr = v as *const [T];
        let ptr = Self::allocate_for_ptr(v_ptr);

        ptr::copy_nonoverlapping(
            v.as_ptr(),
            &mut (*ptr).value as *mut [T] as *mut T,
            v.len());

        Rc { ptr: NonNull::new_unchecked(ptr), phantom: PhantomData }
    }
}

impl<T: ?Sized> Rc<T> {
    // Allocates an `RcBox<T>` with sufficient space for an unsized value
    unsafe fn allocate_for_ptr(ptr: *const T) -> *mut RcBox<T> {
        // Create a fake RcBox to find allocation size and alignment
        let fake_ptr = ptr as *mut RcBox<T>;

        let layout = Layout::for_value(&*fake_ptr);

        let mem = Global.alloc(layout)
            .unwrap_or_else(|_| handle_alloc_error(layout));

        // Initialize the real RcBox
        let inner = set_data_ptr(ptr as *mut T, mem.as_ptr() as *mut u8) as *mut RcBox<T>;

        ptr::write(&mut (*inner).strong, Cell::new(1));
        ptr::write(&mut (*inner).weak, Cell::new(1));

        inner
    }

    fn from_box(v: Box<T>) -> Rc<T> {
        unsafe {
            let box_unique = Box::into_unique(v);
            let bptr = box_unique.as_ptr();

            let value_size = size_of_val(&*bptr);
            let ptr = Self::allocate_for_ptr(bptr);

            // Copy value as bytes
            ptr::copy_nonoverlapping(
                bptr as *const T as *const u8,
                &mut (*ptr).value as *mut _ as *mut u8,
                value_size);

            // Free the allocation without dropping its contents
            box_free(box_unique);

            Rc { ptr: NonNull::new_unchecked(ptr), phantom: PhantomData }
        }
    }
}

// Sets the data pointer of a `?Sized` raw pointer.
//
// For a slice/trait object, this sets the `data` field and leaves the rest
// unchanged. For a sized raw pointer, this simply sets the pointer.
unsafe fn set_data_ptr<T: ?Sized, U>(mut ptr: *mut T, data: *mut U) -> *mut T {
    ptr::write(&mut ptr as *mut _ as *mut *mut u8, data as *mut u8);
    ptr
}

@RalfJung
Copy link
Member Author

RalfJung commented Oct 8, 2018

@Centril asked me to clarify that the alignment I am talking about here is the one of the actual data part pointed to by the fat pointer: A &[u32] has a data ptr aligned to 4, and the code here essentially transmutes that into a &RcBox<[u32]> (not to deref, that would be outrageously UB, but just to get the DST information) -- however, &RcBox<[u32]> expects its data ptr to be 8-aligned.

@Centril Centril added P-medium Medium priority and removed P-high High priority labels Oct 8, 2018
@Centril
Copy link
Contributor

Centril commented Oct 8, 2018

@RalfJung noted that this is unlikely to lead to miscompilation so I'm lowering the prio a bit.

@arielb1
Copy link
Contributor

arielb1 commented Oct 8, 2018

The code originally used let ptr: *mut RcBox<[T]> = mem::transmute([mem::align_of::<RcBox<[T; 1]>>(), value.len()]);, which didn't have this problem.

@arielb1
Copy link
Contributor

arielb1 commented Oct 8, 2018

Can raw trait objects be assumed to always have a valid vtable pointer?

We are going to support object-safe arbitrary self types, so at least in that case they need to be valid for safety (f not for non-UB).

@murarth
Copy link
Contributor

murarth commented Oct 9, 2018

I am submitting a PR that resolves this issue by manually calculating the layout of RcBox<T> in allocate_for_ptr, using the layout of RcBox<()> and the layout of the given unsized value.

The code is similar to that used in Rc::from_raw to compute the address of RcBox<T> from the address of the contained T.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 9, 2018

I guess that also works, but I think we should also have a way to compute the size and alignment of an unaligned raw pointer.

@SimonSapin Inference changes are excluded from stability, right? Is there precedence for changing a function argument type to something that coerces to the original type?

@SimonSapin
Copy link
Contributor

I can’t think of such a precedent out of hand, but calling them is not the only thing that can be done with functions. Passing it to generic code that has a F: Fn(Foo) -> Bar bound would break if we change the Foo parameter type.

I think I missed something in how that relates to this thread, though. What API change are you suggesting?

@RalfJung
Copy link
Member Author

RalfJung commented Oct 9, 2018

What API change are you suggesting?

Changing size_of_val, align_of_val and Layout::for_val to take *const instead of & (after we decided that raw fat pointers must always have valid metadata, which is needed to make this a safe operation).

@SimonSapin
Copy link
Contributor

This feels risky to me. @rust-lang/libs what do you think?

@SimonSapin
Copy link
Contributor

I’d prefer to add new functions that take *const T rather than modify the stable ones.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 9, 2018

I can see that. The stable ones should get warnings though about these references being subject to all the usual validity requirements for references. Might be worth deprecating? Or, at least, have a lint in clippy that complains when they are used with anything involving a raw pointer.

@SimonSapin
Copy link
Contributor

Do you mean warn when they are used with a reference that was just created unsafely from a raw pointer? There’s nothing wrong with using those functions in safe code with valid references, is there? I’m not sure about deprecating, especially if the new pointer-based variants need to be unsafe fns.

@RalfJung
Copy link
Member Author

RalfJung commented Oct 9, 2018

Do you mean warn when they are used with a reference that was just created unsafely from a raw pointer?

Something lime that, yes.

especially if the new pointer-based variants need to be unsafe fns.

Oh sure, that was under the assumption that we can make them safe.

@alexcrichton
Copy link
Member

Yes I agree with Simon that we shouldn't change the stable apis, and if there's a decision to emit wabings or deprecate let's have a dedicated thread for that rather than being buried in the discussion here

bors added a commit that referenced this issue Nov 5, 2018
Fix undefined behavior in Rc/Arc allocation

Manually calculate allocation layout for `Rc`/`Arc` to avoid undefined behavior

Closes #54908
@comex
Copy link
Contributor

comex commented Nov 17, 2018

Just found this by way of Ralf’s latest stacked borrows post. It seems to me that alignment is a red herring. The question is, should size_of_val and align_of_val be allowed to read from the data pointer of the passed-in reference? The current DSTs, slices and trait objects, only need the metadata to determine the size, but with custom DSTs in the future that might not always be the case: for example, a “thin vtable” DST as proposed here would need to fetch the vtable from the data pointer, and a DST version of CStr might want its implementation of size_of_val to call strlen.

If the answer is yes, then it doesn’t generally make sense to fake a pointer and then call size_of_val on it, like the Rc code does. The generic size_of_val API (which is supposed to work for all types) should only accept valid references, and requiring the pointer to be aligned is just a part of validity. In theory, there could be a separate trait for types that can calculate size_of_val and align_of_val given only metadata, but such a trait probably wouldn’t be worth its weight. It would probably be best to just add standalone methods/functions for specific types of DSTs; e.g. the raw::TraitObject API could have methods to query size and alignment.

If the answer is no, on the other hand, then you shouldn’t have to pass a data pointer at all, never mind alignment; it would probably be best to base this on the proposed API for “manipulating the metadata for fat pointers” (same link as above), and add a way to request size and alignment directly from the metadata value itself. But I wouldn’t recommend this route, since as I mentioned, it would be incompatible with multiple use cases for custom DSTs.

@RalfJung
Copy link
Member Author

should size_of_val and align_of_val be allowed to read from the data pointer of the passed-in reference? The current DSTs, slices and trait objects, only need the metadata to determine the size, but with custom DSTs in the future that might not always be the case: for example, a “thin vtable” DST as proposed here would need to fetch the vtable from the data pointer, and a DST version of CStr might want its implementation of size_of_val to call strlen.

Good point! I think I agree. Maybe we should not propose a raw-ptr-based version of these functions quite yet. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-medium Medium priority T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants