Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change CALL_ONCE_INITIALIZATION implementation to use static initialization #4818

Closed
wants to merge 1 commit into from

Conversation

MarijnS95
Copy link
Contributor

Fixes #4792, see that issue for more context.

Apply LLVM upstream patch https://reviews.llvm.org/D19271.

The previous implementation deadlocks when used concurrently from many threads at once.

…zation

Apply LLVM upstream patch https://reviews.llvm.org/D19271.

The previous implementation deadlocks when used concurrently from
many threads at once.
@AppVeyorBot
Copy link

@llvm-beanz
Copy link
Collaborator

There were a lot of challenges back in 2016 related to libstdc++ having a broken std::call_once on some platforms, and before that LLVM didn't require C++ 11... which is terrifying and weird right?

Now that std::call_once is a decade old I think we can rely on it working. I'd prefer a simpler solution using std::call_once, rather than pulling in an LLVM patch that was never merged.

@llvm-beanz llvm-beanz self-requested a review November 28, 2022 17:16
@MarijnS95
Copy link
Contributor Author

@llvm-beanz thanks for checking in; I may not have included all the context from #4792 in this patch but the gist of it is that my backporting of std::once doesn't solve the lockup, in turn getting lockups inside __gthread_once:

(gdb) thread 23
[Switching to thread 23 (Thread 0x7f636bfff6c0 (LWP 146834))]
#0  0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007f63bd1bc821 in ?? () from /usr/lib/libc.so.6
#1  0x00007f63a47069a3 in __gthread_once (__once=0x7f63a5821800 <InitializeLoopSimplifyPassFlag>, __func=0x80)
    at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/12.2.0/../../../../include/c++/12.2.0/x86_64-pc-linux-gnu/bits/gthr-default.h:700
#2  std::call_once<void* (&)(llvm::PassRegistry&), std::reference_wrapper<llvm::PassRegistry> > (__once=..., __f=<optimized out>,
    __args=...) at /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/12.2.0/../../../../include/c++/12.2.0/mutex:859
#3  llvm::call_once<void* (&)(llvm::PassRegistry&), std::reference_wrapper<llvm::PassRegistry> > (flag=..., F=<optimized out>,
    ArgList=...) at include/llvm/Support/Threading.h:92
#4  llvm::initializeLoopSimplifyPass (Registry=...)
    at lib/Transforms/Utils/LoopSimplify.cpp:751
... same as above

Perhaps I did not pick all the patches needed to finalize the migration, but it seems simple enough to not suffer from any deadlocks already. Suggestions/help welcome!

@MarijnS95
Copy link
Contributor Author

There were a lot of challenges back in 2016 related to libstdc++ having a broken std::call_once on some platforms, and before that LLVM didn't require C++ 11... which is terrifying and weird right?

Yes I find C++ and everything terrifying and weird all around, which is why I've switched over to Rust 😉

@llvm-beanz
Copy link
Collaborator

I don't think we should use llvm::call_once either... The llvm namespace versions of these implementations only exist to handle cases where the std version either doesn't work or isn't available everywhere. If gthread_once is locking up on modern implementations that's... terrifying.

@llvm-beanz
Copy link
Collaborator

Does something like this not work?

#define CALL_ONCE_INITIALIZATION(function)                                     \
  std::once_flag flag;                                                         \
  std::call_once(flag, [&Registry]() { function(Registry); });

@mrsaturnsan
Copy link

https://en.cppreference.com/w/cpp/thread/once_flag

I think the std::once_flag object would need to be static.

@llvm-beanz
Copy link
Collaborator

https://en.cppreference.com/w/cpp/thread/once_flag

I think the std::once_flag object would need to be static.

Yes, you are correct.

@mrsaturnsan
Copy link

I don't know if any scope related issues will pop up, but just in case:

#define CALL_ONCE_INITIALIZATION(function)                                     \
  {                                                                            \
    static std::once_flag flag;                                                \
    std::call_once(flag, [&Registry]() { function(Registry); });               \
  }

@llvm-beanz
Copy link
Collaborator

There shouldn't be any scoping issues. The existing implementations of the macros don't have problems and these are only used as macros inside macros...

@mrsaturnsan
Copy link

mrsaturnsan commented Nov 28, 2022

Hmm, i'm getting build errors saying std::call_once is not a member of std. I included mutex at the top as well.

@mrsaturnsan
Copy link

Ah, never mind. I had to write it like this to get it to compile:

#define CALL_ONCE_INITIALIZATION(function)                                     \
    static std::once_flag flag;                                                \
    std::call_once(flag, [&]() { function(Registry); });

@MarijnS95
Copy link
Contributor Author

First up, while looking at this patch on a 2-week-old DXC at b66e05be9458677faf5dcf4eeda76dad6b702738 (somewhere between then and now a SPIR-V codegen bug snuck in, that I have not yet had the opportunity to bisect and report), it now also locked up right at:

https://github.com/MarijnS95/DirectXShaderCompiler/blob/b66e05be9458677faf5dcf4eeda76dad6b702738/lib/Analysis/TargetTransformInfo.cpp#L311-L312

#0  0x00007fa1af0317fd in syscall () from /usr/lib/libc.so.6
#1  0x00007fa1af2a61d1 in __cxxabiv1::__cxa_guard_acquire (g=0x7fa0ebe18a10 <guard variable for llvm::initializeTargetTransformInfoWrapperPassPass(llvm::PassRegistry&)::initialized_once>) at /usr/src/debug/gcc/libstdc++-v3/libsupc++/guard.cc:319
#2  0x00007fa0eb70931a in llvm::initializeTargetTransformInfoWrapperPassPass (Registry=...) at lib/Analysis/TargetTransformInfo.cpp:311
#3  llvm::TargetTransformInfoWrapperPass::TargetTransformInfoWrapperPass (this=this@entry=0x7fa14c0722e0, TIRA=...) at lib/Analysis/TargetTransformInfo.cpp:326
#4  0x00007fa0eb7094a5 in llvm::createTargetTransformInfoWrapperPass (TIRA=...) at lib/Analysis/TargetTransformInfo.cpp:337
#5  0x00007fa0ead4f7c8 in (anonymous namespace)::EmitAssemblyHelper::getPerFunctionPasses (this=0x7fa1867ea388) at tools/clang/lib/CodeGen/BackendUtil.cpp:111
#6  (anonymous namespace)::EmitAssemblyHelper::CreatePasses (this=0x7fa1867ea388) at tools/clang/lib/CodeGen/BackendUtil.cpp:458
#7  (anonymous namespace)::EmitAssemblyHelper::EmitAssembly (this=<optimized out>, Action=<optimized out>, OS=<optimized out>) at tools/clang/lib/CodeGen/BackendUtil.cpp:688
#8  clang::EmitBackendOutput (Diags=..., CGOpts=..., TOpts=..., LOpts=..., TDesc=..., M=<optimized out>, Action=<optimized out>, OS=<optimized out>) at tools/clang/lib/CodeGen/BackendUtil.cpp:770
#9  0x00007fa0ead3ecc1 in clang::BackendConsumer::HandleTranslationUnit (this=0x7fa14d5b6f00, C=...) at tools/clang/lib/CodeGen/CodeGenAction.cpp:191
#10 0x00007fa0eb55b3ab in clang::ParseAST (S=..., PrintStats=false, SkipFunctionBodies=<optimized out>) at tools/clang/lib/Parse/ParseAST.cpp:160
#11 0x00007fa0ead3e0e8 in clang::CodeGenAction::ExecuteAction (this=0x7fa1867ead98) at tools/clang/lib/CodeGen/CodeGenAction.cpp:807
#12 0x00007fa0eaea336e in clang::FrontendAction::Execute (this=0x7fa1867ead98) at tools/clang/lib/Frontend/FrontendAction.cpp:455
#13 0x00007fa0ea923556 in DxcCompiler::Compile (this=this@entry=0x7fa14cc65c70, pSource=pSource@entry=0x7fa1867eb9b8, pArguments=pArguments@entry=0x7fa14df034c0, argCount=22, pIncludeHandler=0x7fa14cdadca0, riid=..., ppResult=0x7fa1867eb968)
    at tools/clang/tools/dxcompiler/dxcompilerobj.cpp:1048
#14 0x00007fa0ea91ea80 in hlsl::DxcCompilerAdapter::WrapCompile (this=<optimized out>, bPreprocess=<optimized out>, pSource=<optimized out>, pSourceName=<optimized out>, pEntryPoint=<optimized out>, pTargetProfile=0x7fa14ddfae20 L"cs_6_6", pArguments=<optimized out>,
    argCount=<optimized out>, pDefines=<optimized out>, defineCount=<optimized out>, pIncludeHandler=<optimized out>, ppResult=<optimized out>, ppDebugBlobName=<optimized out>, ppDebugBlob=<optimized out>)
    at tools/clang/tools/dxcompiler/dxcompilerobj.cpp:1827

Something really fishy is going on. I'll get to testing the above suggestion soon, but all seems futile now (or my machine is just broken?).

@llvm-beanz
Copy link
Collaborator

That's fun... I wish I could say I'm surprised, but I spent enough time debugging LLVM's call_once implementation, so I'm not at all surprised.

The C++ standard dictates that static initialization is thread-safe, but I'm not actually surprised that this might not work as expected.

One of the things on my to-do list is to try building DXC with thread sanitizer enabled to see if we can identify issues like this.

@MarijnS95
Copy link
Contributor Author

It is surprising, because the patch did solve the lockups on our CI completely... To be able to reproduce it, repeatedly on my own machine without much effort is incredibly strange.

Let me know when you get thread sanitizer to work, I'll gladly run it in hopes of getting to the bottom of this.

@llvm-beanz
Copy link
Collaborator

Adding -DLLVM_USE_SANITIZER=Thread to the CMake configuration line seems to produce a working build of DXC for me, so it might work enough to be useful now.

@MarijnS95
Copy link
Contributor Author

Thanks! Trying with the following full command:

cmake -Bbuild -GNinja -C cmake/caches/PredefinedParams.cmake -DSPIRV_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_USE_SANITIZER=Thread

Note that per #4792 (comment) Release vs RelWithDebInfo also seems to have an effect.

@MarijnS95
Copy link
Contributor Author

symbol lookup error: ./libdxcompiler.so: undefined symbol: __tsan_init missing -ltsan perhaps?

@llvm-beanz
Copy link
Collaborator

Weird... The only difference in my build is I also pass -DLLVM_USE_LINKER=lld. It wouldn't surprise me if LLD handles the sanitizer runtimes more smoothly than the system ld.

@MarijnS95
Copy link
Contributor Author

Hmm, that doesn't seem to make a difference. I even tried hacking in a (probably wrong) link_libraries(tsan) but the compilation keeps failing with FATAL: ThreadSanitizer: unexpected memory mapping 0x10000000000-0x6af0e62c000 then.

@llvm-beanz
Copy link
Collaborator

That's unexpected... What is your host OS and CPU architecture? Can you attach a debugger and figure out what is mapped in that region?

TSan needs to identify the mapped memory regions and it looks like your process has something mapped TSan isn't expecting. That might be a build configuration issue with your toolchain, or it might be a bug in TSan, or something completely different...

@MarijnS95
Copy link
Contributor Author

MarijnS95 commented Nov 30, 2022

As said I may have added that wrongly. Now, adding:

if(LLVM_USE_SANITIZER)
  target_link_libraries(dxcompiler PUBLIC tsan)
endif ()

right after add_clang_library(dxcompiler ...) gets it linking with libtsan.so.2 as NEEDED in the dynamic section. However, now it fails dlopening with /usr/lib/libtsan.so.2: cannot allocate memory in static TLS block :)

@llvm-beanz
Copy link
Collaborator

Yea... none of that should be needed. Clang should add the appropriate linker flags based on the sanitizer usage, and I would expect libtsan.so is probably not the correct runtime library. I would instead have expected a static linkage against libclang_rt.tsan-x86_64.a and libclang_rt.tsan-cxx-x86_64.a (unless you're not linux-x86_64).

What is your host OS and CPU?

@MarijnS95
Copy link
Contributor Author

MarijnS95 commented Nov 30, 2022

Right, yes, static makes more sense for this. No idea why it isn't happening.

Host OS is Arch Linux (syncing a few times a week), CPU is a 3970x. Will play around with this on a different albeit similar system and see if/where the libraries are, and why they're not linked against, in turn leaving __tsan_init undefined.

@MarijnS95
Copy link
Contributor Author

MarijnS95 commented Dec 1, 2022

Running into the same symbol lookup error: ./libdxcompiler.so: undefined symbol: __tsan_init error.

@llvm-beanz just making sure: are you loading libdxcompiler.so externally or running the dxc binary? From the following it looks like dxc links statically against the tsan library instead of libdxcompiler.so:

$ nm -D DirectXShaderCompiler/build/bin/dxc | rg tsan_init
00000000000c8060 T __tsan_init
$ nm -D DirectXShaderCompiler/build/lib/libdxcompiler.so | rg tsan_init
                 U __tsan_init

In turn making it seem like this is working via the dxc binary, but libdxcompiler.so is not self-supporting anymore. Perhaps it's just configured on the wrong target, I'll dive in the CMake scripts later unless you beat me to it. EDIT: Or is it so that the host application will have to provide this symbol and is not supported to be used by an arbitrary runtime library alone, in which case I'd have to figure out how - if at all - to enable this in Rust where I load the compiler library?

@llvm-beanz
Copy link
Collaborator

Yea. In order for tsan to work properly all parts of your binary need to be built with it enabled, and the runtime initialization stub gets statically linked into the main executable.

@MarijnS95
Copy link
Contributor Author

MarijnS95 commented Dec 21, 2022

@llvm-beanz thanks for the suggestion, at least it's linking now with RUSTFLAGS=-Zsanitizer=thread and after disabling some other conflicting bits of code in our codebase we now get a dump that's almost 100k lines long...

Quite a lot of noise pointing into the rayon iterator parallelizer library, but we might be able to start with grepping on SUMMARY and libdxc:

EDIT: Note again that this is a build based on top of 24ca1f4, haven't yet tested if our SPIR-V issues have been resolved on main.

EDIT2: Deleted the output, it was from a build without tsan in dxc 🙃

Unsure if it is useful/possible to catch the hang this way, and if it has an effect on whatever tsan would emit.

@MarijnS95
Copy link
Contributor Author

MarijnS95 commented Dec 21, 2022

Hmm, now that I'm running with tsan I've ended up killing our app after generating 700MiB of output. Doing a unique sort on SUMMARY boils it down significantly though:

EDIT: this is on top of 24ca1f4 with the std::call_once suggestion from above.

$ rg SUMMARY tsan | sort -u
...
SUMMARY: ThreadSanitizer: data race /newdata/packages/directx-shader-compiler-git/src/DirectXShaderCompiler/include/llvm/Support/ManagedStatic.h:66:17 in llvm::ManagedStatic<llvm::sys::SmartMutex<true> >::operator*()
SUMMARY: ThreadSanitizer: data race /newdata/packages/directx-shader-compiler-git/src/DirectXShaderCompiler/lib/DxcSupport/Unicode.cpp:114:3 in WideCharToMultiByte(unsigned int, unsigned int, wchar_t const*, int, char*, int, char const*, bool*)
SUMMARY: ThreadSanitizer: data race /newdata/packages/directx-shader-compiler-git/src/DirectXShaderCompiler/lib/Support/Mutex.cpp:86:58 in llvm::sys::MutexImpl::acquire()
SUMMARY: ThreadSanitizer: data race /newdata/packages/directx-shader-compiler-git/src/DirectXShaderCompiler/lib/Support/Mutex.cpp:89:19 in llvm::sys::MutexImpl::acquire()
SUMMARY: ThreadSanitizer: data race /newdata/packages/directx-shader-compiler-git/src/DirectXShaderCompiler/lib/Support/raw_ostream.cpp:691:7 in llvm::raw_fd_ostream::preferred_buffer_size() const
SUMMARY: ThreadSanitizer: data race /newdata/packages/directx-shader-compiler-git/src/DirectXShaderCompiler/lib/Support/Unix/Signals.inc:122:7 in RegisterHandlers()
SUMMARY: ThreadSanitizer: data race /newdata/packages/directx-shader-compiler-git/src/DirectXShaderCompiler/tools/clang/lib/Sema/SemaDeclCXX.cpp:10120:3 in clang::Sema::DeclareImplicitCopyAssignment(clang::CXXRecordDecl*)
SUMMARY: ThreadSanitizer: data race /newdata/packages/directx-shader-compiler-git/src/DirectXShaderCompiler/tools/clang/lib/Sema/SemaDeclCXX.cpp:10945:3 in clang::Sema::DeclareImplicitCopyConstructor(clang::CXXRecordDecl*)
SUMMARY: ThreadSanitizer: data race /newdata/packages/directx-shader-compiler-git/src/DirectXShaderCompiler/tools/clang/lib/Sema/SemaDeclCXX.cpp:6485:5 in clang::Sema::AddImplicitlyDeclaredMembersToClass(clang::CXXRecordDecl*)
SUMMARY: ThreadSanitizer: data race /newdata/packages/directx-shader-compiler-git/src/DirectXShaderCompiler/tools/clang/lib/Sema/SemaDeclCXX.cpp:6488:5 in clang::Sema::AddImplicitlyDeclaredMembersToClass(clang::CXXRecordDecl*)
SUMMARY: ThreadSanitizer: data race /newdata/packages/directx-shader-compiler-git/src/DirectXShaderCompiler/tools/clang/lib/Sema/SemaDeclCXX.cpp:6505:5 in clang::Sema::AddImplicitlyDeclaredMembersToClass(clang::CXXRecordDecl*)
SUMMARY: ThreadSanitizer: data race /newdata/packages/directx-shader-compiler-git/src/DirectXShaderCompiler/tools/clang/lib/Sema/SemaDeclCXX.cpp:6526:5 in clang::Sema::AddImplicitlyDeclaredMembersToClass(clang::CXXRecordDecl*)
SUMMARY: ThreadSanitizer: data race /newdata/packages/directx-shader-compiler-git/src/DirectXShaderCompiler/tools/clang/lib/Sema/SemaDeclCXX.cpp:8972:3 in clang::Sema::DeclareImplicitDefaultConstructor(clang::CXXRecordDecl*)
SUMMARY: ThreadSanitizer: data race /newdata/packages/directx-shader-compiler-git/src/DirectXShaderCompiler/tools/clang/lib/Sema/SemaDeclCXX.cpp:9440:3 in clang::Sema::DeclareImplicitDestructor(clang::CXXRecordDecl*)
SUMMARY: ThreadSanitizer: data race /newdata/packages/directx-shader-compiler-git/src/DirectXShaderCompiler/tools/clang/lib/Sema/SemaHLSL.cpp in ConvertDimensions(ArTypeInfo, ArTypeInfo, clang::ImplicitConversionKind&, TYPE_CONVERSION_REMARKS&)
SUMMARY: ThreadSanitizer: heap-use-after-free /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/12.2.0/../../../../include/c++/12.2.0/bits/char_traits.h:431:33 in std::char_traits<char>::copy(char*, char const*, unsigned long)

@damyanp
Copy link
Member

damyanp commented Sep 26, 2024

This PR seems to have been inactive for quite a while now, so I'm going to close it. Feel free to reactive if you want to rebase it and continue working on a solution in DXC.

@damyanp damyanp closed this Sep 26, 2024
@MarijnS95
Copy link
Contributor Author

MarijnS95 commented Sep 26, 2024

We're still building wit this patch to prevent the aforementioned issues. I don't have permissions to reopen (what you call "reactive"..ate?), and rebasing breaks the reopening feature because GitHub things you want to use the branch for something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

libdxcompiler.so locks up when used in many threads at once
5 participants