Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

!Send Future violates stacked borrow rules #3796

Closed
coder137 opened this issue Aug 8, 2024 · 14 comments
Closed

!Send Future violates stacked borrow rules #3796

coder137 opened this issue Aug 8, 2024 · 14 comments
Labels
C-support Category: Not necessarily a bug, but someone asking for support

Comments

@coder137
Copy link

coder137 commented Aug 8, 2024

Tokio issue: tokio-rs/tokio#6750
Futures issue: rust-lang/futures-rs#2877

Explanation

The tests only fail when !Send future is used along with tokio::join! or futures::join!

Miri message

error: Undefined Behavior: trying to retag from <204081> for SharedReadWrite permission at alloc61855[0x88], but that tag does not exist in the borrow stack for this location
   --> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.38.1/src/sync/mpsc/bounded.rs:242:22
    |
242 |         poll_fn(|cx| self.chan.recv(cx)).await
    |                      ^^^^^^^^^
    |                      |
    |                      trying to retag from <204081> for SharedReadWrite permission at alloc61855[0x88], but that tag does not exist in the borrow stack for this location
    |                      this error occurs as part of two-phase retag at alloc61855[0x88..0x90]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <204081> was created by a Unique retag at offsets [0x88..0x90]
   --> tests/tokio_tests.rs:13:26
    |
13  |             let (a, b) = tokio::join!(rx1.recv(), rx2.recv());
    |                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: <204081> was later invalidated at offsets [0x58..0xd8] by a SharedReadOnly retag
   --> /home/runner/work/ticked-async-executor/ticked-async-executor/src/ticked_async_executor.rs:[115](https://github.com/coder137/ticked-async-executor/actions/runs/10259175195/job/28383207204#step:8:116):17
    |
115 |                 runnable.run();
    |                 ^^^^^^^^^^^^^^

The relevant debug logs are in the individual issues linked above

@RalfJung
Copy link
Member

RalfJung commented Aug 8, 2024

Isn't this a duplicate of #3780?

If not, we'll need a self-contained reproducing example, ideally as small as possible. Send makes no difference at all for Miri so something else has to be going on.

@coder137
Copy link
Author

For #3780, I tried the recommended fix in tokio: tokio-rs/tokio#6744
Reference: coder137/ticked-async-executor@6cc5e9d#diff-2e9d962a08321605940b5a657135052fbcef87b5e360662bb527c96d9a615542

Does not fix the issue.
Also, for some reason when using async_task the Send variant does not cause a problem but the !Send variant does.

@coder137
Copy link
Author

coder137 commented Aug 10, 2024

To replicate the issue run the following:

#[test]
fn test_local_future() {
    let (tx1, mut rx1) = tokio::sync::mpsc::channel::<usize>(1);
    let (tx2, mut rx2) = tokio::sync::mpsc::channel::<usize>(1);

    let (runnable_tx, runnable_rx) = std::sync::mpsc::channel();
    let (runnable, task) = async_task::spawn_local(
        async move {
            let (a, b) = tokio::join!(rx1.recv(), rx2.recv());
            assert_eq!(a, Some(10));
            assert_eq!(b, Some(20));
        },
        move |runnable| {
            runnable_tx.send(runnable).unwrap();
        },
    );
    runnable.schedule();
    task.detach();

    let tick_cb = || {
        runnable_rx.try_iter().for_each(|runnable| {
            runnable.run();
        });
    };

    tx1.try_send(10).unwrap();
    tick_cb();
    tick_cb();

    tx2.try_send(20).unwrap();
    tick_cb();
    tick_cb();
}

Toml file

...
[dependencies]
async-task = "4.7"
tokio = { version = "1", features = ["full"] }

@RalfJung RalfJung added the C-support Category: Not necessarily a bug, but someone asking for support label Aug 10, 2024
@RalfJung
Copy link
Member

I am not sure when I will have time to dig into this and figure out what exactly is happening. Meanwhile, if someone wants to help it'd be great to have this example minimized further until it does not have any dependencies any more.

@tiif
Copy link
Contributor

tiif commented Aug 12, 2024

An attempt to minimize the example:

fn main() {
    let (tx1, mut rx1) = tokio::sync::mpsc::channel::<usize>(1);

    let (runnable_tx, runnable_rx) = std::sync::mpsc::channel();
    let (runnable, _task) = async_task::spawn_local(
        async move {
            rx1.recv().await;
        },
        move |runnable| {
            runnable_tx.send(runnable).unwrap();
        },
    );
    runnable.schedule();

    runnable_rx.iter().next().unwrap().run();
    tx1.try_send(20).unwrap();
    runnable_rx.iter().next().unwrap().run();
}

Note:
Similar error will be returned for both tokio::sync::mpsc::channel::<usize>(1) and tokio::sync::mpsc::unbounded_channel::<usize>().

It'd be nice if we can get rid of

    let (runnable, _task) = async_task::spawn_local(
        async move {
            rx1.recv().await;
        },
        move |runnable| {
            runnable_tx.send(runnable).unwrap();
        },
    );

so we can remove the async-task dependency. I think this should be doable with just tokio but I am not sure how yet.

@coder137
Copy link
Author

coder137 commented Aug 16, 2024

Tested spawn_local with a mpmc channel and miri test passes

Seems like if recv channel has a &mut self requirement thats when this violation occurs.

Alternative example (without channels, but &mut self requirement and mutation after await)

struct Data {
    data: usize,
}

impl Data {
    pub async fn mut_fn(&mut self) {
        self.data += 1;
        tokio::task::yield_now().await;
        self.data += 2; // This causes the violation
    }
}

#[test]
fn test_local_future_min() {
    let mut my_test = Data { data: 10 };

    let (runnable_tx, runnable_rx) = std::sync::mpsc::channel();
    let (runnable, task) = async_task::spawn_local(
        async move {
            my_test.mut_fn().await;
        },
        move |runnable| {
            runnable_tx.send(runnable).unwrap();
        },
    );
    runnable.schedule();
    task.detach();

    runnable_rx.iter().next().unwrap().run();
    runnable_rx.iter().next().unwrap().run();
}

@coder137
Copy link
Author

coder137 commented Aug 16, 2024

@RalfJung @tiif This seems like a duplicate of #3780 afaik.
Does this mean that the issue is upstream: rust-lang/rust#63818?

@RalfJung
Copy link
Member

This seems like a duplicate of #3780 afaik.
Does this mean that the issue is upstream: rust-lang/rust#63818?

That is my theory. It's hard to test... or is there some easy way to make Coroutine not have any niches?
Cc @compiler-errors

@compiler-errors
Copy link
Member

@RalfJung: Are you asking for, e.g., a way for a coroutine to emit a totally flat layout, where every field is at a different offset for testing? That should be possible.

@RalfJung
Copy link
Member

I was originally thinking of the niches of the coroutine itself, so that e.g. Option<Coroutine> can't store the discriminant somewhere in the coroutine.

But your question made me realize that coroutines have their own discriminant, and that one should also not be using a niche for this.

Having fields overlap is fine otherwise.

@RalfJung
Copy link
Member

This error persists even with rust-lang/rust#129313. So, it's not the same problem as #3780.

@RalfJung
Copy link
Member

RalfJung commented Aug 20, 2024

I think I found a minimized version of the same problem:

use std::{
    future::Future,
    pin::Pin,
    sync::Arc,
    task::{Context, Poll, Wake},
};

struct ThingAdder<'a> {
    thing: &'a mut String,
}

impl Future for ThingAdder<'_> {
    type Output = ();

    fn poll(self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<Self::Output> {
        unsafe {
            *self.get_unchecked_mut().thing += ", world";
        }
        Poll::Pending
    }
}

fn force_ref(_x: &impl Future) {}

fn main() {
    let mut thing = "hello".to_owned();
    let fut = async move { ThingAdder { thing: &mut thing }.await };

    let mut fut = fut;
    let mut fut = unsafe { Pin::new_unchecked(&mut fut) };

    let waker = Arc::new(DummyWaker).into();
    let mut ctx = Context::from_waker(&waker);
    assert_eq!(fut.as_mut().poll(&mut ctx), Poll::Pending);
    force_ref(&*fut); // Create a shared ref to the future itself, causing an aliasing model "fake" read.
    assert_eq!(fut.as_mut().poll(&mut ctx), Poll::Pending);
}

struct DummyWaker;

impl Wake for DummyWaker {
    fn wake(self: Arc<Self>) {}
}

At least, this also errors with very similar symptoms (on terms of which kinds of references get created and invalidated in which order by which other accesses). My version also errors in Tree Borrows, unlike the original, but I think that is because the original happens to not write through a mutable reference again when it could.

What's happening is the following:

  • We are setting up a self-referential future, i.e. we have a !Unpin type with two fields where one field is a mutable reference to the other. The mutable reference has type, say, &mut String and is hence subject to all the usual aliasing constraints.
  • Now we are calling Pin::deref(). This safe operation creates a shared reference to the entire future. Since there is no UnsafeCell, this acts like a read of the entire future. That's bad news for our mutable reference -- having another pointer do a read clearly violates the aliasing requirements of this reference.
  • The next time this reference is used, we have UB.

In the example above, the problematic Pin::deref call is caused by the self.id field access here. However, that code is really not to blame. Reading the id field is perfectly fine, but for a brief moment there exists a reference to the entire Checked<F>, which covers also the inner field, and that is causing the problems. Wrapping inner in an UnsafeCell would probably help but that's a terrible hack.

I think my conclusion is that such self-referenced fields must be treated like an UnsafeCell, i.e. we can't assume they are read-only. This is not a necessary property of aliasing with mutable reference, but it is a necessary consequence of the decision that Pin::deref should be safe.

@RalfJung
Copy link
Member

RalfJung commented Aug 20, 2024

So, I will close this as yet another duplicate of rust-lang/rust#63818 that will be resolved if/when someone implements UnsafePinned. Thanks for bringing this to our attention!

@RalfJung
Copy link
Member

RalfJung commented Aug 20, 2024

FWIW, there is a possible work-around we could apply in Miri: to treat all !Unpin coroutine types as-if they were an UnsafeCell. However, codegen would still emit noalias for such types so to ensure we don't accept code that has LLVM UB, we'd also need to make the compiler generate impl !Freeze for all coroutines that are !Unpin.

But TBH I feel like piling more and more hacks on top of each other to avoid the consequences of rust-lang/rust#63818 is a waste of time. We know what the proper fix is, we just need to find someone willing and able to implement it. :) It is time to actually finish the so-far only half-finished integration of self-referential corotuines into the language.

And in the future I hope we can avoid stabilizing features when they have such gaping holes...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-support Category: Not necessarily a bug, but someone asking for support
Projects
None yet
Development

No branches or pull requests

4 participants