Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vec::push invalidates interior references even when it does not reallocate #60847

Closed
RalfJung opened this issue May 15, 2019 · 15 comments · Fixed by #61114
Closed

Vec::push invalidates interior references even when it does not reallocate #60847

RalfJung opened this issue May 15, 2019 · 15 comments · Fixed by #61114
Labels
A-collections Area: `std::collection` C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented May 15, 2019

To my knowledge, the following code is intended to be legal:

fn main() {
    let mut v = Vec::with_capacity(10);
    v.push(0);
    let v0 = unsafe { &*(&v[0] as *const _) }; // laundering the lifetime -- we take care that `v` does not reallocate, so that's okay.
    v.push(1);
    let _val = *v0;
}

However, Miri currently flags this as UB. The reason is that Vec::push implicitly (through auto-deref) calls deref_mut to create a &mut [T] covering all elements in the vector, which overlaps with our reference v0 in the example code above. Overlapping shared and mutable references are prohibited by Stacked Borrows.

@matklad
Copy link
Member

matklad commented May 15, 2019

Specifically, code for push looks like this:

    pub fn push(&mut self, value: T) {
        if self.len == self.buf.cap() {
            self.reserve(1);
        }
        unsafe {
            let end = self.as_mut_ptr().add(self.len);
            ptr::write(end, value);
            self.len += 1;
        }
    }

and it is self.as_mut_ptr() line that causes problems: as_mut_ptr is a method on [T], so it's the implicit deref that creates a reference.

There's probably a bunch of other methods with similar problems. For example, fn remove(&mut self, index: usize) also invalidates all references, and not just the [index..] slice

@RalfJung
Copy link
Member Author

Good point, I forgot to mention that the call is implicit. I amended my original post.

@lnicola
Copy link
Member

lnicola commented May 15, 2019

Vec doesn't seem to have a test suite. But if it did, would it be possible to run the standard library tests under miri?

@RalfJung
Copy link
Member Author

We already do. :)

Specifically, this runs the #[test] functions in src/libcore/tests, src/liballoc/tests and src/liballoc.

@jonas-schievink jonas-schievink added A-collections Area: `std::collection` C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels May 15, 2019
@RalfJung
Copy link
Member Author

I just noticed push has another, related problem: as mentioned above, the raw pointer it creates via self.as_mut_ptr() is created from an &[T]. This slice covers only the already existing elements of the vector, but the pointer arithmetic goes beyond the end of that slice! This is illegal, just like the following code is illegal:

fn main() { unsafe {
    let x: &[u8] = &[1,2,3];
    let x_01 = &x[0..2];
    let raw = x_01.as_ptr();
    let _val = raw.add(2).read(); // pointer is beyond the bounds of the slice it was created from!
} }

This does not currently get caught by Miri because of imprecision around our handling of raw pointers.

Centril added a commit to Centril/rust that referenced this issue May 25, 2019
Vec: avoid creating slices to the elements

Instead of `self.deref_mut().as_mut_ptr()` to get a raw pointer to the buffer, use `self.buf.ptr_mut()`. This (a) avoids creating a unique reference to all existing elements without any need, and (b) creates a pointer that can actually be used for the *entire* buffer, and not just for the part of it covered by `self.deref_mut()`.

I also got worried about `RawVec::ptr` returning a `*mut T` from an `&self`, so I added both a mutable and an immutable version.

Cc @gankro in particular for the `assume` changes -- I don't know why that is not in `Unique`, but I moved it up from `Vec::deref` to `RawVec::ptr` to avoid having to repeat it everywhere.

Fixes rust-lang#60847
Centril added a commit to Centril/rust that referenced this issue May 26, 2019
Vec: avoid creating slices to the elements

Instead of `self.deref_mut().as_mut_ptr()` to get a raw pointer to the buffer, use `self.buf.ptr_mut()`. This (a) avoids creating a unique reference to all existing elements without any need, and (b) creates a pointer that can actually be used for the *entire* buffer, and not just for the part of it covered by `self.deref_mut()`.

I also got worried about `RawVec::ptr` returning a `*mut T` from an `&self`, so I added both a mutable and an immutable version.

Cc @gankro in particular for the `assume` changes -- I don't know why that is not in `Unique`, but I moved it up from `Vec::deref` to `RawVec::ptr` to avoid having to repeat it everywhere.

Fixes rust-lang#60847
@vakaras
Copy link
Contributor

vakaras commented May 24, 2021

Making the motivating example legal seems to also make the following example legal (playground):

fn main() {
    let mut v = vec![1, 2, 3, 4, 5];
    v.pop();
    let ptr = v.as_mut_ptr();
    let ptr = unsafe { &mut *ptr };
    test(&mut v, ptr);
    println!("{:?}", v);    // [8, 2, 3, 4, 7]
}
fn test(v: &mut Vec<u32>, x: &mut u32) {
    v.push(7);
    *x = 8;
}

This looks very wrong for me. It also completely breaks Prusti… Isn't there another way to fix whatever code depends on this?

@RalfJung
Copy link
Member Author

What do you mean by "it breaks Prusti"?

@vakaras
Copy link
Contributor

vakaras commented May 24, 2021

Our current encoding assumes that the parameters v and x cannot alias (we have v * x in separation logic). So, you either get a verification error or an unsoundness.

@RalfJung
Copy link
Member Author

RalfJung commented May 25, 2021

I think that model is incomplete, but that seems fine to me (unless you want to claim your tool is total). After all, one could also easily write code like

fn test(v: &mut *mut u32, x: mut u32) {
  unsafe { *v.add(1) = 1; }
  *x = 2;
}

and call this function with *v and x having the same value -- and if you are a bit careful with the way you create these pointers, that is totally fine:

let mut arr = [1u32, 2];
let v = arr.as_mut_ptr();
test(v, unsafe { &mut *v });

This is, conceptually, what happens in your example as well.

@RalfJung
Copy link
Member Author

RalfJung commented May 25, 2021

Put differently, in your example, main cannot call an arbitrary function test with the same type this way, that could go wrong badly. It has to know what test does for this to be sound. So it seems expected to me that the type signature of test alone is not sufficient to derive the precondition that main satisfies when calling test.

Essentially, test is what one may call a "super-safe" function: the actual precondition required to call test is weaker than what the types insinuate. (This is in contrast to unsafe functions whose actual precondition is stronger than what the types insinuate.) Code that wants to exploit this clearly requires custom annotations to inform the prover of this fact. This pattern can (and, I am sure, will) arise all over the place, so I don't think there is anything wrong with it also coming up around Vec.

(My example uses raw ptrs so it does not have a "super-safe" function; maybe that makes it a bad example.)

@vakaras
Copy link
Contributor

vakaras commented May 25, 2021

Just to avoid potential misunderstanding: here I am interested in the semantics of Vec. In other words, I think that my example (as well as the original example) should be rejected because they violate the invariant of Vec. More specifically, Vec internally uses Unique, whose documentation says that:

Unlike *mut T, Unique<T> behaves "as if" it were an instance of T.

My mental image of this sentence is that I should treat the following struct Foo:

struct Foo {
    f: Unique<u32>,
}

as if it was:

struct Foo {
    f: u32,
}

And, therefore, be able to conclude that x and y do not alias:

fn bar(x: &mut Foo, y: &mut u32) { ... }

Does this make sense to you? Or maybe you see a flaw in my reasoning?

@vakaras
Copy link
Contributor

vakaras commented May 25, 2021

If I understand you correctly, super safe functions are the functions whose callers are allowed to call them with arguments that have broken invariants, right? My gut feeling says that such functions could result in all type invariants becoming public (I remember seeing this problem popping up with C++ friend functions), which makes reasoning really hard (not only for automatic verification but also for humans). Do you know whether anyone tried to incorporate visibility into the RustBelt proofs (prove that all code that needs to open a type invariant has access to all fields that are mentioned in it)?

By the way, I have a feeling that examples like the ones that motivated this issue are coming from the fact that programmers are (mis)using Vec as a way to allocate memory on the heap. Wouldn't it be better to provide a data structure that serves exactly that purpose? I guess RawVec could be a good starting point.

@RalfJung
Copy link
Member Author

Vec internally uses Unique

This is a good point. However, Unique is internal and unstable for a reason. We still need to figure out what exactly its rules are. The documentation you cite should likely be updated, but you are also quoting it out-of-context: reading the full text, I think what is meant by this is that for the purpose of the type system, Unique<T> behaves like T. For example, auto traits and dropck will all consider this type to contain owned T. But Unique<T> is certainly different from Box<T>, so this analogy does not carry over to ownership. (This becomes particularly clear when considering empty Vec.)

I think your test does not violate Unique since the unique pointer and x are never used to access the same memory. This matches the requirements LLVM imposes on noalias, which is what Unique is meant to model.

I think that my example (as well as the original example) should be rejected because they violate the invariant of Vec.

I would phrase this slightly differently, namely -- these examples exploit deep knowledge of the invariant of Vec. They break the abstraction barrier, taking apart the ownership of the backing buffer. But they don't actually violate the underlying ownership discipline.

For most types, breaking the abstraction barrier like this would be considered a bug. The exact invariant is an internal implementation detail. Vec, however, is a bit special in that it quite explicitly exposes a lot of its implementation details in the documentation. Due to this, I think code like your example is justified.

By the way, I have a feeling that examples like the ones that motivated this issue are coming from the fact that programmers are (mis)using Vec as a way to allocate memory on the heap. Wouldn't it be better to provide a data structure that serves exactly that purpose? I guess RawVec could be a good starting point.

I think Vec is actually meant to be used this way, so it does not constitute a misuse. But this is just me interpreting the docs to infer the intention of the authors, so don't take my word for it.

@RalfJung
Copy link
Member Author

Btw, discussions in closed issues are generally discouraged and easily get lost. You are raising some interesting questions though. Could you open a new issue and provide a short summary? Then we could invite some t-libs people to determine whether this is considered within the intended usage of Vec or not.

@vakaras
Copy link
Contributor

vakaras commented May 25, 2021

Sure. I will do that.

Edit: the explanation of why this is needed and allowed can be found in this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collection` C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants