Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stacked Borrows violation in macOS RwLock #121626

Closed
RalfJung opened this issue Feb 26, 2024 · 11 comments
Closed

Stacked Borrows violation in macOS RwLock #121626

RalfJung opened this issue Feb 26, 2024 · 11 comments
Labels
I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

Today's run of the std test suite in Miri failed on macOS:

  3.314536   test sync::rwlock::tests::frob ... error: Undefined Behavior: trying to retag from <42368726> for SharedReadOnly permission at alloc9742021[0x0], but that tag does not exist in the borrow stack for this location
  0.000053      --> /home/runner/work/miri-test-libstd/miri-test-libstd/rust-src-patched/library/core/src/ptr/non_null.rs:401:18
  0.000015       |
  0.000009   401 |         unsafe { &*self.as_ptr().cast_const() }
  0.000010       |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009       |                  |
  0.000010       |                  trying to retag from <42368726> for SharedReadOnly permission at alloc9742021[0x0], but that tag does not exist in the borrow stack for this location
  0.000008       |                  this error occurs as part of retag at alloc9742021[0x0..0x30]
  0.000009       |
  0.000009       = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
  0.000009       = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
  0.000010   help: <42368726> was created by a SharedReadOnly retag at offsets [0x0..0x8]
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:348:32
  0.000009       |
  0.000009   348 |                 let mut next = ptr::from_ref(&node)
  0.000009       |                                ^^^^^^^^^^^^^^^^^^^^
  0.000011   help: <42368726> was later invalidated at offsets [0x8..0x10] by a write access
  0.000010      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:347:17
  0.000008       |
  0.000009   347 |                 node.prev = AtomicLink::new(None);
  0.000009       |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009       = note: BACKTRACE (of the first span):
  0.000009       = note: inside `core::ptr::NonNull::<sys::locks::rwlock::queue::Node>::as_ref::<'_>` at /home/runner/work/miri-test-libstd/miri-test-libstd/rust-src-patched/library/core/src/ptr/non_null.rs:401:18: 401:46
  0.000009   note: inside `sys::locks::rwlock::queue::add_backlinks_and_find_tail`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:265:17
  0.000009       |
  0.000009   265 |                 next.as_ref().prev.set(Some(current));
  0.000009       |                 ^^^^^^^^^^^^^
  0.000008   note: inside `sys::locks::rwlock::queue::RwLock::unlock_queue`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:487:33
  0.000009       |
  0.000008   487 |             let tail = unsafe { add_backlinks_and_find_tail(to_node(state)) };
  0.000009       |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sys::locks::rwlock::queue::RwLock::lock_contended`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:383:25
  0.000009       |
  0.000008   383 |                         self.unlock_queue(next);
  0.000009       |                         ^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sys::locks::rwlock::queue::RwLock::write`
  0.000010      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:312:13
  0.000009       |
  0.000009   312 |             self.lock_contended(true)
  0.000009       |             ^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sync::rwlock::RwLock::<()>::write`
  0.000009      --> std_miri_test/../library/std/src/sync/rwlock.rs:361:13
  0.000009       |
  0.000009   361 |             self.inner.write();
  0.000009       |             ^^^^^^^^^^^^^^^^^^
  0.000010   note: inside closure
  0.000011      --> std_miri_test/../library/std/src/sync/rwlock/tests.rs:37:26
  0.000009       |
  0.000010   37  |                     drop(r.write().unwrap());
  0.000008       |                          ^^^^^^^^^

The diff to the last successful run is this, but since this is a concurrency test, there's a high chance that the bug already existed before and only surfaces randomly. So #110211 is the most likely culprit.

So far I haven't managed to reproduce this outside the context of the standard library test suite.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 26, 2024
@RalfJung
Copy link
Member Author

Looking at the code, what seems to be happening is that one iteration through the loop we create a shared reference here

let mut next = ptr::from_ref(&node)
.map_addr(|addr| addr | QUEUED | (state.addr() & LOCKED))
as State;

I am surprised that the diagnostic says "created by a SharedReadOnly retag at offsets [0x0..0x8]". This type does have interior mutability, so parts of the retag should be SharedReadWrite. This is the type in question:

#[repr(align(8))]
struct Node {
next: AtomicLink,
prev: AtomicLink,
tail: AtomicLink,
write: bool,
thread: OnceCell<Thread>,
completed: AtomicBool,
}

Everything except the bool here is an UnsafeCell. So how can there be 8 bytes of SharedReadOnly (i.e., outside UnsafeCell)?

miri-test-libstd uses -Zrandomize-layout. Maybe that has something to do with it. Maybe that's why we couldn't reproduce it yet.

@RalfJung
Copy link
Member Author

So with random layout, I guess the bool field comes first and then an 8-aligned field follows, meaning we have 8 bytes of "not interior mutable" memory at the start. So far, so good.

Then apparently later it got invalidated at offsets 8 to 16 here

node.prev = AtomicLink::new(None);

So that pointer must be the second field.
Then later we are using the original shared reference again and 💥 .

I think this is morally equivalent to the following:

use std::cell::UnsafeCell;

fn main() {
    let mut c = UnsafeCell::new(42);
    let ptr = c.get();
    c = UnsafeCell::new(13);
    unsafe { ptr.read(); }
}

This is rejected by Miri with Stacked Borrows but accepted with Tree Borrows.

The "blessed" way to do this is to never mix direct accesses to a local with accesses through derived pointers. Fixing that in queue.rs shouldn't be too hard.

bors added a commit to rust-lang/miri that referenced this issue Feb 26, 2024
tree borrows: add a test to sb_fails

This is something that happens in the wild (rust-lang/rust#121626), so TB accepting this is good. Let's make sure we notice if this ever changes.
@RalfJung
Copy link
Member Author

From what @joboet said, the actual bug is that there's anyone even still using the old reference when we are doing the write that does the invalidation.

According to the Miri log, we have these events in this order:

A reference gets created and turned into a raw pointer

let mut next = ptr::from_ref(&node)
.map_addr(|addr| addr | QUEUED | (state.addr() & LOCKED))
as State;

The memory that reference points to is written to via direct access to the local variable

node.prev = AtomicLink::new(None);

Someone uses the pointer derived from the reference

next.as_ref().prev.set(Some(current));

We have a full backtrace only for the last event.

Backtrace
  0.000009   note: inside `sys::locks::rwlock::queue::add_backlinks_and_find_tail`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:265:17
  0.000009       |
  0.000009   265 |                 next.as_ref().prev.set(Some(current));
  0.000009       |                 ^^^^^^^^^^^^^
  0.000008   note: inside `sys::locks::rwlock::queue::RwLock::unlock_queue`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:487:33
  0.000009       |
  0.000008   487 |             let tail = unsafe { add_backlinks_and_find_tail(to_node(state)) };
  0.000009       |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sys::locks::rwlock::queue::RwLock::lock_contended`
  0.000009      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:383:25
  0.000009       |
  0.000008   383 |                         self.unlock_queue(next);
  0.000009       |                         ^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sys::locks::rwlock::queue::RwLock::write`
  0.000010      --> std_miri_test/../library/std/src/sys/locks/rwlock/queue.rs:312:13
  0.000009       |
  0.000009   312 |             self.lock_contended(true)
  0.000009       |             ^^^^^^^^^^^^^^^^^^^^^^^^^
  0.000009   note: inside `sync::rwlock::RwLock::<()>::write`
  0.000009      --> std_miri_test/../library/std/src/sync/rwlock.rs:361:13
  0.000009       |
  0.000009   361 |             self.inner.write();
  0.000009       |             ^^^^^^^^^^^^^^^^^^
  0.000010   note: inside closure
  0.000011      --> std_miri_test/../library/std/src/sync/rwlock/tests.rs:37:26
  0.000009       |
  0.000010   37  |                     drop(r.write().unwrap());
  0.000008       |                          ^^^^^^^^^

@jieyouxu jieyouxu added I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Feb 26, 2024
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Feb 26, 2024
@apiraino
Copy link
Contributor

AFAIK T-libs doesn't use the I-prioritize label, so probably fine to remove it.

@rustbot label -I-prioritize

@rustbot rustbot removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Feb 26, 2024
@RalfJung
Copy link
Member Author

RalfJung commented Feb 26, 2024

I did managed to reproduce this on my machine once with that flag (with seed 2852 on Miri bbabee144ea12b210a7be9488f0eeed9efe3e24d), but when I run the testcase again with that seed it doesn't happen again.

This is the test:

use rand::Rng;
use std::sync::mpsc::channel;
use std::sync::{Arc, RwLock};
use std::thread;

fn main() {
    const N: u32 = 10;
    const M: usize = if cfg!(miri) { 100 } else { 1000 };

    let r = Arc::new(RwLock::new(()));

    let (tx, rx) = channel::<()>();
    for _ in 0..N {
        let tx = tx.clone();
        let r = r.clone();
        thread::spawn(move || {
            let mut rng = rand::thread_rng();
            for _ in 0..M {
                if rng.gen_bool(1.0 / (N as f64)) {
                    drop(r.write().unwrap());
                } else {
                    drop(r.read().unwrap());
                }
            }
            drop(tx);
        });
    }
    drop(tx);
    let _ = rx.recv();
}

Full command I used:

CARGO_EXTRA_FLAGS="--profile=release" MIRI_SEED_START=2189 MIRI_SEED_END=4096 MIRIFLAGS="-Zrandomize-layout -Zmiri-strict-provenance -Zmiri-disable-isolation" ./miri many-seeds ./miri run --dep rwlock.rs --target aarch64-apple-darwin

So... it uses actual randomness, and thus of course it does not reproduce even with the same seed. :/ (Disabling isolation means we give the program access to host randomness, which will be used to initialize the thread_rng.)

@RalfJung
Copy link
Member Author

RalfJung commented Feb 26, 2024

The backtrace was a bit different this time:

error: Undefined Behavior: trying to retag from <527506> for SharedReadWrite permission at alloc280832[0x0], but that tag does not exist in the borrow stack for this location
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:401:18
    |
401 |         unsafe { &*self.as_ptr().cast_const() }
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |                  |
    |                  trying to retag from <527506> for SharedReadWrite permission at alloc280832[0x0], but that tag does not exist in the borrow stack for this location
    |                  this error occurs as part of retag at alloc280832[0x0..0x28]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <527506> was created by a SharedReadWrite retag at offsets [0x0..0x21]
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^
help: <527506> was later invalidated at offsets [0x8..0x10] by a write access
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^
    = note: BACKTRACE (of the first span):
    = note: inside `std::ptr::NonNull::<std::sys::locks::rwlock::queue::Node>::as_ref::<'_>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:401:18: 401:46
    = note: inside `std::sys::locks::rwlock::queue::add_backlinks_and_find_tail` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:257:26: 257:42
    = note: inside `std::sys::locks::rwlock::queue::RwLock::unlock_queue` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:487:33: 487:76
    = note: inside `std::sys::locks::rwlock::queue::RwLock::unlock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:468:28: 468:51
    = note: inside `std::sys::locks::rwlock::queue::RwLock::read_unlock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:440:22: 440:50
    = note: inside `std::sys::locks::rwlock::queue::RwLock::read_unlock` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:416:36: 416:69
    = note: inside `<std::sync::RwLockReadGuard<'_, ()> as std::ops::Drop>::drop` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:715:13: 715:42
    = note: inside `std::ptr::drop_in_place::<std::sync::RwLockReadGuard<'_, ()>> - shim(Some(std::sync::RwLockReadGuard<'_, ()>))` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:515:1: 515:56
    = note: inside `std::mem::drop::<std::sync::RwLockReadGuard<'_, ()>>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/mem/mod.rs:992:24: 992:25
note: inside closure
   --> rwlock.rs:22:21
    |
22  |                     drop(r.read().unwrap());
    |                     ^^^^^^^^^^^^^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

So the entry point of the bad use of the pointer is read this time, not write. But from unlock_queue onwards it's the same.

@RalfJung
Copy link
Member Author

I finally found some seeds. :) 2998, 5308, 7680. On Miri bbabee144ea12b210a7be9488f0eeed9efe3e24d:

$ MIRIFLAGS="-Zmiri-seed=2998 -Zmiri-track-weak-memory-loads -Zmiri-track-pointer-tag=669222" ./miri run --dep rwlock.rs --target aarch64-apple-darwin
[...]
note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:758:5
    |
758 |     r
    |     ^ created tag 669222 with SharedReadOnly/SharedReadWrite permission for frozen/non-frozen parts (pointee type std::sys::locks::rwlock::queue::Node) at alloc322898[0x0..0x28] derived from <669221>
    |
    = note: inside `std::ptr::from_ref::<std::sys::locks::rwlock::queue::Node>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:758:5: 758:6
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:348:32: 348:52
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:203:16
    |
203 |     fn prepare(&mut self) {
    |                ^^^^^^^^^ popped tracked tag for item [SharedReadOnly for <669222>] due to Unique retag from <690408> (that retag created <690409>)
    |
    = note: inside `std::sys::locks::rwlock::queue::Node::prepare` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:203:16: 203:25
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:340:17: 340:31
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:346:17
    |
346 |                 node.next.0 = AtomicPtr::new(state.mask(MASK).cast());
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ popped tracked tag for item [SharedReadWrite for <669222>] due to Write access for <571708>
    |
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:346:17: 346:70
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:347:17
    |
347 |                 node.prev = AtomicLink::new(None);
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ popped tracked tag for item [SharedReadWrite for <669222>] due to Write access for <571708>
    |
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:347:17: 347:50
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

error: Undefined Behavior: trying to retag from <669222> for SharedReadWrite permission at alloc322898[0x0], but that tag does not exist in the borrow stack for this location
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:401:18
    |
401 |         unsafe { &*self.as_ptr().cast_const() }
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |                  |
    |                  trying to retag from <669222> for SharedReadWrite permission at alloc322898[0x0], but that tag does not exist in the borrow stack for this location
    |                  this error occurs as part of retag at alloc322898[0x0..0x28]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <669222> was created by a SharedReadWrite retag at offsets [0x0..0x21]
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^
help: <669222> was later invalidated at offsets [0x8..0x10] by a write access
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^
    = note: BACKTRACE (of the first span):
    = note: inside `std::ptr::NonNull::<std::sys::locks::rwlock::queue::Node>::as_ref::<'_>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:401:18: 401:46
    = note: inside `std::sys::locks::rwlock::queue::add_backlinks_and_find_tail` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:265:17: 265:30
    = note: inside `std::sys::locks::rwlock::queue::RwLock::unlock_queue` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:487:33: 487:76
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:383:25: 383:48
    = note: inside `std::sys::locks::rwlock::queue::RwLock::read` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:295:13: 295:39
    = note: inside `std::sync::RwLock::<()>::read` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:273:13: 273:30
note: inside closure
   --> rwlock.rs:22:26
    |
22  |                     drop(r.read().unwrap());
    |                          ^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

@RalfJung
Copy link
Member Author

RalfJung commented Mar 2, 2024

Turns out this diff fixes all the 3 seeds that I found above:

--- a/library/std/src/sys/locks/rwlock/queue.rs
+++ b/library/std/src/sys/locks/rwlock/queue.rs
@@ -317,7 +317,7 @@ pub fn write(&self) {
     fn lock_contended(&self, write: bool) {
         let update = if write { write_lock } else { read_lock };
         let mut node = Node::new(write);
-        let mut state = self.state.load(Relaxed);
+        let mut state = self.state.load(Acquire);
         let mut count = 0;
         loop {
             if let Some(next) = update(state) {

Alternatively, this diff also does it:

--- a/library/std/src/sys/locks/rwlock/queue.rs
+++ b/library/std/src/sys/locks/rwlock/queue.rs
@@ -286,7 +286,7 @@ pub const fn new() -> RwLock {
 
     #[inline]
     pub fn try_read(&self) -> bool {
-        self.state.fetch_update(Acquire, Relaxed, read_lock).is_ok()
+        self.state.fetch_update(Acquire, Acquire, read_lock).is_ok()
     }
 
     #[inline]

So I think something is wrong with the memory orderings used somewhere. I can't tell which of these state accesses should be strengthened though. (And there might more 1-line diffs that suffice, I didn't try all the Relaxed.)

I also found some other suspicious orderings, namely RMW operations where the success case has a weaker ordering than the failure case:

match self.state.fetch_update(Release, Acquire, |state| {

match self.state.compare_exchange_weak(
state,
state.mask(!QUEUE_LOCKED),
Release,
Acquire,
) {

self.state.compare_exchange_weak(state, UNLOCKED, Release, Acquire)

IMO these justify a comment explaining why the orderings as so surprising, but they don't seem involved in this bug, at least not in the cases we have found so far.

bors added a commit to rust-lang/miri that referenced this issue Mar 2, 2024
print thread name in miri error backtraces; add option to track read/write accesses

This came up while debugging rust-lang/rust#121626. It didn't end up being useful there but still seems like good tools to have around.
bors added a commit to rust-lang/miri that referenced this issue Mar 2, 2024
print thread name in miri error backtraces; add option to track read/write accesses

This came up while debugging rust-lang/rust#121626. It didn't end up being useful there but still seems like good tools to have around.
bors added a commit to rust-lang/miri that referenced this issue Mar 2, 2024
print thread name in miri error backtraces; add option to track read/write accesses

This came up while debugging rust-lang/rust#121626. It didn't end up being useful there but still seems like good tools to have around.
@RalfJung
Copy link
Member Author

RalfJung commented Mar 3, 2024

Turn out I went down the wrong rabbit hole. Changing these orders means the program took a different path somewhere earlier in the execution so we didn't hit the bad interleaving any more. I found a seed that reproduces this issue even without weak memory emulation, so the orderings can't be the problem:

Miri 9272474b755dca7dfea4f96fd8344dd12e223ae8, -Zmiri-seed=11408 -Zmiri-disable-weak-memory-emulation

$ ./miri run --dep rwlock.rs --target aarch64-apple-darwin -Zmiri-seed=11408 -Zmiri-disable-weak-memory-emulation -Zmiri-track-pointer-tag=1389599
note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:758:5
    |
758 |     r
    |     ^ created tag 1389599 with SharedReadOnly/SharedReadWrite permission for frozen/non-frozen parts (pointee type std::sys::locks::rwlock::queue::Node) at alloc702309[0x0..0x28] derived from <1389598>
    |
    = note: BACKTRACE on thread `unnamed-9`:
    = note: inside `std::ptr::from_ref::<std::sys::locks::rwlock::queue::Node>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:758:5: 758:6
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:348:32: 348:52
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:203:16
    |
203 |     fn prepare(&mut self) {
    |                ^^^^^^^^^ popped tracked tag for item [SharedReadOnly for <1389599>] due to Unique retag from <1399547> (that retag created <1399548>)
    |
    = note: BACKTRACE on thread `unnamed-9`:
    = note: inside `std::sys::locks::rwlock::queue::Node::prepare` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:203:16: 203:25
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:340:17: 340:31
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:346:17
    |
346 |                 node.next.0 = AtomicPtr::new(state.mask(MASK).cast());
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ popped tracked tag for item [SharedReadWrite for <1389599>] due to Write access for <1272013>
    |
    = note: BACKTRACE on thread `unnamed-9`:
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:346:17: 346:70
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

note: tracking was triggered
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:347:17
    |
347 |                 node.prev = AtomicLink::new(None);
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ popped tracked tag for item [SharedReadWrite for <1389599>] due to Write access for <1272013>
    |
    = note: BACKTRACE on thread `unnamed-9`:
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:347:17: 347:50
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

error: Undefined Behavior: trying to retag from <1389599> for SharedReadWrite permission at alloc702309[0x0], but that tag does not exist in the borrow stack for this location
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:401:18
    |
401 |         unsafe { &*self.as_ptr().cast_const() }
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |                  |
    |                  trying to retag from <1389599> for SharedReadWrite permission at alloc702309[0x0], but that tag does not exist in the borrow stack for this location
    |                  this error occurs as part of retag at alloc702309[0x0..0x28]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <1389599> was created by a SharedReadWrite retag at offsets [0x0..0x21]
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^
help: <1389599> was later invalidated at offsets [0x8..0x10] by a write access
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^
    = note: BACKTRACE (of the first span) on thread `unnamed-9`:
    = note: inside `std::ptr::NonNull::<std::sys::locks::rwlock::queue::Node>::as_ref::<'_>` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/core/src/ptr/non_null.rs:401:18: 401:46
    = note: inside `std::sys::locks::rwlock::queue::add_backlinks_and_find_tail` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:265:17: 265:30
    = note: inside `std::sys::locks::rwlock::queue::RwLock::unlock_queue` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:487:33: 487:76
    = note: inside `std::sys::locks::rwlock::queue::RwLock::lock_contended` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:383:25: 383:48
    = note: inside `std::sys::locks::rwlock::queue::RwLock::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sys/locks/rwlock/queue.rs:312:13: 312:38
    = note: inside `std::sync::RwLock::<()>::write` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/library/std/src/sync/rwlock.rs:361:13: 361:31
note: inside closure
   --> rwlock.rs:20:26
    |
20  |                     drop(r.write().unwrap());
    |                          ^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

What makes this particularly strange is that everything relevant seems to happen on the same thread... not sure what to make of that.

RalfJung pushed a commit to RalfJung/rust that referenced this issue Mar 3, 2024
…=RalfJung

tree borrows: add a test to sb_fails

This is something that happens in the wild (rust-lang#121626), so TB accepting this is good. Let's make sure we notice if this ever changes.
RalfJung pushed a commit to RalfJung/rust that referenced this issue Mar 3, 2024
…RalfJung

print thread name in miri error backtraces; add option to track read/write accesses

This came up while debugging rust-lang#121626. It didn't end up being useful there but still seems like good tools to have around.
@joboet
Copy link
Member

joboet commented Mar 3, 2024

I found the bug! It's an ABA problem. When a new node is added, its thread basically executes:

let next = state.load();
node.next = next;
state.compare_exchange(next, &node);

In the failing executions, the thread gets interrupted for a long time, long enough that the thread at the head of the queue gets woken up (invalidating all shared references to it) and gets readded to the queue (with a new shared reference). The CAS succeeds, because the pointer has the same bitpattern but once we try to perform an access through the pointer, we obviously incur UB.

This is very hard to fix, since the bug can occur anytime a Node allocation has the same address as a previous one. At the moment, the only solution I see is to use lossy provenance and make miri guess the right borrow permissions.

@RalfJung
Copy link
Member Author

RalfJung commented Mar 3, 2024

Nice catch!

Closing in favor of #121950.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants