Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking with mold breaks initialization of thread local storage in Rust under FreeBSD on AMD64 #1338

Closed
mqudsi opened this issue Aug 29, 2024 · 6 comments

Comments

@mqudsi
Copy link
Contributor

mqudsi commented Aug 29, 2024

Compiling fish-shell from git master with rust 1.79.0 (latest from FreeBSD pkg repos) and mold 2.31.0 (latest from pkg repos) causes random segfaults that were traced back to misbehaving thread_local! static variables.

Everything starts off fine, but after a few seconds of using the shell (typing at the prompt) it segfaults due to an incorrect tls thread id making some assertions fail. I edited the code some to add some debug stderr output to see what was going on and it seems that either after a certain number of times (but more likely from certain call sites) thread local storage is not being initialized upon first access by a new thread, and it is retaining its default (zero) value.

For reference, here's the (patched) function in question that assigns a custom thread id based off of an atomic variable that is incremented once per thread:

#[inline(never)]
pub fn thread_id() -> usize {
    static THREAD_COUNTER: AtomicUsize = AtomicUsize::new(0);

    thread_local! {
        static THREAD_ID: usize = (|| {
            let id = THREAD_COUNTER.fetch_add(1, Ordering::Relaxed);
            eprintln!("Rust thread {:?} assigned fish thread id {}",
                std::thread::current().id(), id);
            id
        })();
    }
    let id = THREAD_ID.with(|id| *id);

    // ThreadId(1) is assigned (by rust's std) to the main thread upon startup
    if std::thread::current().id() != unsafe { std::mem::transmute(1_u64) } {
        eprintln!("Rust thread {:?} accessing fish thread_id({})", std::thread::current().id(), id);
    }
    id
}

Under FreeBSD, compiling with environment variable RUSTFLAGS set to -C link-args=--ld-path=/usr/local/bin/mold, the following incorrect behavior is observed:

  • At the start of execution, new background threads emit the first eprintln!() call printing the mapping from the rust-native thread id to the simple usize fish thread id when they first access thread_id() and then they print the second eprintln!() call upon that and each subsequent invocation
  • Then (some seconds later, not during a race) a newly created thread (with rust thread id that is not ThreadId(1)) is observed calling thread_id() but not entering the thread_local!() initializer; it prints the second eprintln!() call with an incorrect id value of 0 and is never observed to enter the thread_local!() initializer.

Compiling and running fish from master under FreeBSD:

$ pkg install git-lite cmake ninja rust mold
$ git clone https://github.com/fish-shell/fish-shell.git
$ cd fish-shell
$ env RUSTFLAGS="-C link-args=--ld-path=/usr/local/bin/mold" make
$ build/fish

This behavior is observed with the default (DEBUG) build, without any optimizations or architecture-specific optimizations enabled. Tested under FreeBSD 13.3. The same procedure without the RUSTFLAGS environment variable set to specify building with mold results in a binary that works correctly.

Happy to post any differential outputs of the mold-linked and normal builds as needed to help pin this down. For what it's worth, I do not observe any similar behavior using mold to link fish under Linux/AMD64.


Edit: this version of the function more directly illustrates the problem; the debug_assert_ne!() call results in a panic because TLS was not initialized. Again, this doesn't happen right off the bat; it's after some time has passed/from a particular call site.

diff --git a/src/threads.rs b/src/threads.rs
index 06b59649f..a03bbe0af 100644
--- a/src/threads.rs
+++ b/src/threads.rs
@@ -89,3 +89,3 @@ fn main_thread_id() -> usize {
 fn thread_id() -> usize {
-    static THREAD_COUNTER: AtomicUsize = AtomicUsize::new(0);
+    static THREAD_COUNTER: AtomicUsize = AtomicUsize::new(42);
     // It would be much nicer and faster to use #[thread_local] here, but that's nightly only.
@@ -97,3 +97,5 @@ fn thread_id() -> usize {
     }
-    THREAD_ID.with(|id| *id)
+    let id = THREAD_ID.with(|id| *id);
+    debug_assert_ne!(id, 0, "Thread local storage not initialized!");
+    id
 }
@mqudsi mqudsi changed the title Linking with mold breaks some thread local variables in Rust under FreeBSD on AMD64 Linking with mold breaks initialization of thread local storage in Rust under FreeBSD on AMD64 Aug 29, 2024
@mqudsi
Copy link
Contributor Author

mqudsi commented Aug 29, 2024

Good news, it seems not to reproduce with mold 1.11.0 so it's probably possible to bisect this if you don't have a more direct way of arriving at the cause.

@mqudsi
Copy link
Contributor Author

mqudsi commented Aug 31, 2024

This bisected to c395da1, which was a fix for #1216

c395da1c5414656e1b22f3e71a42ba17c51673a4 is the first bad commit
commit c395da1c5414656e1b22f3e71a42ba17c51673a4 (HEAD)
Author: Rui Ueyama <ruiu@cs.stanford.edu>
Date:   Wed Mar 13 16:01:02 2024 +0900

    Do not create an unnecessary gap in file for BSS
    
    Fixes https://github.com/rui314/mold/issues/1216

 elf/passes.cc | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

@rui314
Copy link
Owner

rui314 commented Aug 31, 2024

I don't have a FreeBSD machine, and it is not easy to set up a new one. I usually use Docker, but that works only for Linux guests. Did you know if there's a FreeBSD machine that I can login to debug this issue?

@mqudsi
Copy link
Contributor Author

mqudsi commented Aug 31, 2024

Hey, thanks for your interest in looking into this. You should be able to ssh rui314@dev.neosmart.net -p 22314 into a freshly installed copy of FreeBSD 13.3 with basic dev tools installed (neovim/nvim is also installed).

pkg search pkg-name and sudo pkg install pkg-name is probably the core FreeBSD thing to know.

@rui314 rui314 closed this as completed in f6822fb Aug 31, 2024
@rui314
Copy link
Owner

rui314 commented Aug 31, 2024

Thank you for the test environment. The above fix should work for you. You can remove the test environment now.

@mqudsi
Copy link
Contributor Author

mqudsi commented Aug 31, 2024

Thanks for the fix; that was fast! ❤️

mqudsi added a commit to fish-shell/fish-shell that referenced this issue Aug 31, 2024
Worth including because mold is rather popular in the rust world and because the
bug affects mold versions coincident with the development of the fish rust port.

The bug affects all currently released versions of mold from 2.30.0 (Mar 2024)
onwards under at least FreeBSD (though quite likely other platforms as well).

See rui314/mold#1338 for reference.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@rui314 @mqudsi and others