Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSan false positive with standalone memory fence #1415

Open
alexey-katranov opened this issue Jun 17, 2021 · 11 comments
Open

TSan false positive with standalone memory fence #1415

alexey-katranov opened this issue Jun 17, 2021 · 11 comments

Comments

@alexey-katranov
Copy link

Thread Sanitizer reports a race over data in the following code snippet. I do not know if TSan supports standalone fences but I see some __tsan_atomic_thread_fence in the code. In accordance with the C++ standard the code seems valid: the release operation X is observed with any fenced load Y followed with acquire fence F should give us happens before between operations before X and after F. (strictly speaking there is the second path over numThreadsFinished++ but it seems pretty simple) (surely if Y operation with acquire fence, everything is Ok).

#include <atomic>
#include <thread>

std::atomic<int> numThreadsFinished{};
std::atomic<int> epoch{};

void wait() {
    if (numThreadsFinished++ == 0) {
        while (epoch.load(std::memory_order_relaxed) == 0) {} // Y
        std::atomic_thread_fence(std::memory_order_acquire); // F
        return;
    }
    epoch++; // X
}

int main() {
    int data{};
    std::thread thr([&] {
        data = 1;
        wait();
    });
    wait();
    if (data != 1) {
        return -1;
    }
    thr.join();
    return 0;
}
[user@localhost build]$ ./clang_10.0_cxx11_64_debug/test_task
LLVMSymbolizer: error reading file: No such file or directory
==================
WARNING: ThreadSanitizer: data race (pid=2277694)
  Read of size 4 at 0x7ffd3fe3f2b8 by main thread:
    #0 main /home/user/c/github/oneTBB/test/tbb/test_task.cpp:24:9 (test_task+0x4cefce)

  Previous write of size 4 at 0x7ffd3fe3f2b8 by thread T1:
    #0 main::$_0::operator()() const /home/user/c/github/oneTBB/test/tbb/test_task.cpp:20:14 (test_task+0x4cf768)
    #1 void std::__invoke_impl<void, main::$_0>(std::__invoke_other, main::$_0&&) /usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:60:14 (test_task+0x4cf710)
    #2 std::__invoke_result<main::$_0>::type std::__invoke<main::$_0>(main::$_0&&) /usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:95:14 (test_task+0x4cf620)
    #3 void std::thread::_Invoker<std::tuple<main::$_0> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/thread:264:13 (test_task+0x4cf5c8)
    #4 std::thread::_Invoker<std::tuple<main::$_0> >::operator()() /usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/thread:271:11 (test_task+0x4cf568)
    #5 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::$_0> > >::_M_run() /usr/bin/../lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/thread:215:13 (test_task+0x4cf44f)
    #6 <null> <null> (libstdc++.so.6+0xd8ad3)

  Location is stack of main thread.

  Location is global '??' at 0x7ffd3fe21000 ([stack]+0x00000001e2b8)

  Thread T1 (tid=2277696, finished) created by main thread at:
    #0 pthread_create <null> (test_task+0x464181)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xd8d98)
    #2 main /home/user/c/github/oneTBB/test/tbb/test_task.cpp:19:17 (test_task+0x4cefb3)

SUMMARY: ThreadSanitizer: data race /home/user/c/github/oneTBB/test/tbb/test_task.cpp:24:9 in main
==================
ThreadSanitizer: reported 1 warnings
@dvyukov
Copy link
Contributor

dvyukov commented Jun 17, 2021

#1352 is related

@alexey-katranov
Copy link
Author

@dvyukov , I prototyped simple implementation of standalone fences. The idea is that each thread has release fence clock and acquire fence clock. Additionally, SyncVar is extended with clock for the fences. The algorithm is following:

  1. The release fence updates the release fence clock
  2. Any atomic store (incl. relaxed) copies the thread release fence clock into SyncVar fence clock
  3. Any atomic load (incl. relaxed) copies SyncVar fence clock into the thread acquire fence clock
  4. The acquire fence updates thread clock with acquire fence clock

The prototype seems to work correctly for the example above:

#include <atomic>
#include <thread>

std::atomic<int> numThreadsFinished{};
std::atomic<int> epoch{};

int racy_data{};

void wait() {
    if (numThreadsFinished++ == 0) {
        while (epoch.load(std::memory_order_relaxed) == 0) {} // Y

        volatile auto racy_load = racy_data; // RACE is reported!
        (void)racy_load;

        std::atomic_thread_fence(std::memory_order_acquire); // F
        return;
    }
    epoch++; // X
}

int main() {
    int data{};
    std::thread thr([&] {
        data = 1;
        racy_data = 1;
        wait();
    });
    wait();
    if (data != 1) { // RACE is NOT reported!
        return -1;
    }
    thr.join();
    return 0;
}

Did you think about such implementation in the past? Does it contain hidden issues that are difficult to resolve? Are you interested in further development? Surely, the prototype works only for this example and really suboptimal.

@dvyukov
Copy link
Contributor

dvyukov commented Jun 28, 2021

Hi Alex,

Please see:
https://reviews.llvm.org/D47107#1125052
It should answer some of your questions.

Overall I think tsan needs to support standalone fences. There are correct programs that use fences and there are no good workarounds for these cases. However, if it increases runtime costs, it should be guarded by a separate flag.

Another aspect is that I have a change in-flight which effectively rewrites all of the runtime:
https://github.com/dvyukov/llvm-project/pull/3/files
It's not ready but we should decide soon on its fate. I am not sure what's the right way to order this change vs your change, I guess it may depend on the size of your change.

But either way a good first step would be a set of positive/negative test cases that will now just document the current behavior, but will also contains the expected behavior as comments

@alexey-katranov
Copy link
Author

https://reviews.llvm.org/D47107#1125052
It should answer some of your questions.

It seems I reinvent the wheel. It was just one evening effort, so it is definitely better to finish your major change. Moreover, I am not in hurry and it is our of scope of my job. As for initial problem, we just rework the code not to use standalone fences.

It seems that the above review has not been updated for three years. What did block the patch and why not to finish it?

As I understand, ktsan implements the fences. It seems that it reimplements the logic of tsan, why it was not separated into a common module?

As for tests, I can try prepare some tests, do you have any BKMs, how to build and run tests (it would be good if I do not need to build clang).

@dvyukov
Copy link
Contributor

dvyukov commented Jun 28, 2021

BKMs

@dvyukov dvyukov closed this as completed Jun 28, 2021
@dvyukov
Copy link
Contributor

dvyukov commented Jun 28, 2021

https://reviews.llvm.org/D47107#1125052
It should answer some of your questions.

It seems I reinvent the wheel. It was just one evening effort, so it is definitely better to finish your major change. Moreover, I am not in hurry and it is our of scope of my job. As for initial problem, we just rework the code not to use standalone fences.

A non-upstreammed wheel does not count :)

It seems that the above review has not been updated for three years. What did block the patch and why not to finish it?

I have no idea.
That's how OSS works. You spend time writing an extensive reply and then you just never hear from the author again :)

As I understand, ktsan implements the fences. It seems that it reimplements the logic of tsan, why it was not separated into a common module?

Different languages, different code styles, different environments, different specifics.

As for tests, I can try prepare some tests, do you have any BKMs, how to build and run tests (it would be good if I do not need to build clang).

I don't think there is a way to avoid building llvm, but you need to do this only once.
We have these docs, but there also should be some llvm docs:
https://github.com/google/sanitizers/wiki/AddressSanitizerHowToBuild
For testing you run ninja check-tsan in the build dir. There is also a way to run a single test:
https://reviews.llvm.org/D103054

@RalfJung
Copy link

RalfJung commented Jul 1, 2022

The problem with tsan not supporting fences still exists, right? Shouldn't the issue be reopened, or is this tracked somewhere else?

@dvyukov
Copy link
Contributor

dvyukov commented Jul 1, 2022

Not sure why I closed it. I usually write explanations when closing issues. Maybe just hit Close accidentally...

@dvyukov dvyukov reopened this Jul 1, 2022
@OlivierNicole
Copy link

Is there any form of annotation or other to avoid a false positive due to TSan not taking into account an acquire fence?

@dvyukov
Copy link
Contributor

dvyukov commented Jul 6, 2022

  1. Change the code to not use stand-alone fences.
  2. There are __tsan_acquire/release annotations.
  3. It's also possible to use a fake aromic operations with acquire/release ordering in tsan build as annotations.

@OlivierNicole
Copy link

I see. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants