Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference counting of std::shared_ptr is non-atomic when using the mold linker #1286

Closed
dutor opened this issue Jun 20, 2024 · 5 comments
Closed

Comments

@dutor
Copy link

dutor commented Jun 20, 2024

First of all, thanks for this great linker. It's blazing fast and saves me a lot of build time.

My environment:

  • OS: Ubuntu 20.04
  • GCC: 9.4.0
  • mold: 2.30.0 & 2.32.0

Simplified code to reproduce this issue:

// file main.cpp
#include <stdio.h>
#include <memory>

using SP = std::shared_ptr<int>;
// run() will launch multiple threads to incr/decr the refcnt of the given shared_ptr.
// upon return from run() all copies of the shared_ptr will be released.
void run(SP sp, SP(*)(const SP&));
SP copy(const SP &sp) {
    return sp;
}
int main() {
    auto sp = std::make_shared<int>(0);
    run(sp, copy);
    // Here we expect the use_count() is 1
    fprintf(stderr, "use_count: %lu\n", sp.use_count());
    return 0;
}

// file run.cpp
#include <stdio.h>
#include <memory>
#include <thread>
#include <vector>

using SP = std::shared_ptr<int>;

void run(SP sp, SP (*copy)(const SP&)) {
    static constexpr auto N = 4UL;
    std::vector<std::thread> threads;
    threads.reserve(N);

    auto func = [=] (SP ptr) {
        static constexpr auto kLoops = 5000000UL;
        std::vector<SP> sps;
        sps.reserve(kLoops);
        for (auto i = 0UL; i < kLoops; i++) {
            sps.push_back(copy(ptr));
        }
    };

    for (auto i = 0UL; i < N; i++) {
        threads.emplace_back(std::thread(func, sp));
    }

    for (auto i = 0UL; i < N; i++) {
        threads[i].join();
    }
}

Build & run

$ export MOLD_PATH=./usr/libexec/mold
$ g++ -pthread -B$MOLD_PATH -fPIC -shared run.cpp -o librun.so
$ g++ -pthread -B$MOLD_PATH main.cpp -L. -lrun  -o main
$ ./main
use_count: 2199319

Some facts:

  • Reference counting of shared_ptr is atomic only if it's in a multiple threaded process. Refer to here
  • The detection of multiple threads is via __gthread_active_p, which utilizes a weak reference to __pthread_key_create from libpthread. Refer to here
  • When linked with the mold linker, the reference count is non-atomic because __gthread_active_p returns false.
  • In the same environment, ld.bfd, ld.gold and mold 2.1.0 is OK.

Thanks in advance.

@rui314
Copy link
Owner

rui314 commented Jun 21, 2024

Thank you for your report. This is a very difficult issue and arguably a bug in glibc rather than the linker. At least the code is very fragile as it depends on when a weak symbol is resolved.

Even with GNU ld, if you compile your main executable with g++ -pthread main.cpp ./librun.so -o main -fno-PIC -no-pie -Wl,-allow-shlib-undefined, the output is broken just like mold's output. Or, if you use LLVM lld and compile with g++ -pthread main.cpp ./librun.so -o main -fno-PIC -no-pie, the result is the same.

Let me think more about how to fix this. By the way, how did you find this problem?

@rui314 rui314 closed this as completed in 06b5926 Jun 22, 2024
@rui314
Copy link
Owner

rui314 commented Jun 22, 2024

Please try again with git head.

@dutor
Copy link
Author

dutor commented Jun 24, 2024

Thanks for the reply and fix!

By the way, how did you find this problem?

We have experienced several occasional memory issues, like heap-use-after-free on the control block of shared_ptr and memory leaks on resources managed by shared_ptr. So we tracked down each reference counting operation of shared_ptr and found out that the atomicity was compromised. Then the runtime atomic dispatch and pthread weak symbol things, etc.

Interestingly, at the beginning we fixed this bug by linking against libpthread explicitly with --no-as-needed for every binary(just like the fix 06b5926). But that seams not encouraged as per the -pthread option. Then we discovered that other linkers like ld and gold dont have these issue(for our build options).

This is a very difficult issue and arguably a bug in glibc rather than the linker

I'm not a toolchain guy. Is there any related discussion on this problem? Maybe I can have more understanding on this.

@rui314
Copy link
Owner

rui314 commented Jun 24, 2024

I wrote the explanation of the issue as the commit message, so you may want to read it first if you want to understand it more. Feel free to ask any questions!

@rui314
Copy link
Owner

rui314 commented Jun 24, 2024

I think this is worth making a new release. I'll be releasing mold 2.32.1 soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants