-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread Sanitizer FATAL error on kernel version 6.6.6-x #1716
Comments
More information: $ ./run.me & cat /proc/$!/maps
6004e9242000-6004e9243000 r--p 00000000 fc:01 4099534 /home/USER/workspace/sanitizer_test/cmake-build-debug/run.me
6004e9243000-6004e9244000 r-xp 00001000 fc:01 4099534 /home/USER/workspace/sanitizer_test/cmake-build-debug/run.me
6004e9244000-6004e9245000 r--p 00002000 fc:01 4099534 /home/USER/workspace/sanitizer_test/cmake-build-debug/run.me
6004e9245000-6004e9247000 rw-p 00002000 fc:01 4099534 /home/USER/workspace/sanitizer_test/cmake-build-debug/run.me
75b06ee00000-75b06ee9c000 r--p 00000000 fc:01 34999046 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.32
75b06ee9c000-75b06efcd000 r-xp 0009c000 fc:01 34999046 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.32
75b06efcd000-75b06f05a000 r--p 001cd000 fc:01 34999046 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.32
75b06f05a000-75b06f05b000 ---p 0025a000 fc:01 34999046 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.32
75b06f05b000-75b06f069000 rw-p 0025a000 fc:01 34999046 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.32
75b06f069000-75b06f06d000 rw-p 00000000 00:00 0
75b06f119000-75b06f127000 r--p 00000000 fc:01 34998039 /usr/lib/x86_64-linux-gnu/libm.so.6
75b06f127000-75b06f1a3000 r-xp 0000e000 fc:01 34998039 /usr/lib/x86_64-linux-gnu/libm.so.6
75b06f1a3000-75b06f1fe000 r--p 0008a000 fc:01 34998039 /usr/lib/x86_64-linux-gnu/libm.so.6
75b06f1fe000-75b06f200000 rw-p 000e4000 fc:01 34998039 /usr/lib/x86_64-linux-gnu/libm.so.6
75b06f200000-75b06f228000 r--p 00000000 fc:01 34999014 /usr/lib/x86_64-linux-gnu/libtsan.so.2.0.0
75b06f228000-75b06f2e6000 r-xp 00028000 fc:01 34999014 /usr/lib/x86_64-linux-gnu/libtsan.so.2.0.0
75b06f2e6000-75b06f31c000 r--p 000e6000 fc:01 34999014 /usr/lib/x86_64-linux-gnu/libtsan.so.2.0.0
75b06f31c000-75b06f31d000 ---p 0011c000 fc:01 34999014 /usr/lib/x86_64-linux-gnu/libtsan.so.2.0.0
75b06f31d000-75b06f328000 rw-p 0011c000 fc:01 34999014 /usr/lib/x86_64-linux-gnu/libtsan.so.2.0.0
75b06f328000-75b070274000 rw-p 00000000 00:00 0
75b0702b2000-75b0702d9000 r--p 00000000 fc:01 60032318 /etc/ld.so.cache
75b0702d9000-75b0702db000 rw-p 00000000 00:00 0
75b0702db000-75b0702dd000 r--p 00000000 fc:01 34997065 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
75b0702dd000-75b070307000 r-xp 00002000 fc:01 34997065 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
75b070307000-75b070312000 r--p 0002c000 fc:01 34997065 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
75b070313000-75b070317000 rw-p 00037000 fc:01 34997065 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
7fff9b1be000-7fff9b1df000 rw-p 00000000 00:00 0 [stack]
7fff9b1e7000-7fff9b1eb000 r--p 00000000 00:00 0 [vvar]
7fff9b1eb000-7fff9b1ed000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]
FATAL: ThreadSanitizer: unexpected memory mapping 0x6004e9242000-0x6004e9243000 |
And more info; disabling virtual memory address randomization fixes the issue: $ TSAN_OPTIONS="verbosity=3" setarch `uname -m` -R ./run.me
==142298==Installed the sigaction for signal 11
==142298==Installed the sigaction for signal 7
==142298==Installed the sigaction for signal 8
==142298==Using libbacktrace symbolizer.
***** Running under ThreadSanitizer v3 (pid 142298) *****
ThreadSanitizer: growing sync allocator: 0 out of 1048576*1024
ThreadSanitizer: growing heap block allocator: 0 out of 262144*4096
==142298==__tls_get_addr: DTLS_Find 0x7ffff7e84ec8 2
==142298==__tls_get_addr: DTLS_NextBlock 0x7ffff7e84ec8 0
==142298==__tls_get_addr: 0x7ffff6c64fa0 {0x2,0x0} => 0x7ffff7e84ee0; tls_beg: 0x7ffff7e84ee0; sp: 0x7fffffffd9c0 num_live_dtls 1
==142298==__tls_get_addr: static tls: 0x7ffff7e84ee0
==142298==__tls_get_addr: DTLS_Find 0x7ffff7e84ec8 2
Stats: SizeClassAllocator64: 0M mapped (0M rss) in 513 allocations; remains 513
02 ( 32): mapped: 64K allocs: 128 frees: 0 inuse: 128 num_freed_chunks 1920 avail: 2048 rss: 4K releases: 0 last released: 0K region: 0x7b0800000000
04 ( 64): mapped: 64K allocs: 128 frees: 0 inuse: 128 num_freed_chunks 896 avail: 1024 rss: 4K releases: 0 last released: 0K region: 0x7b1000000000
05 ( 80): mapped: 64K allocs: 128 frees: 0 inuse: 128 num_freed_chunks 691 avail: 819 rss: 4K releases: 0 last released: 0K region: 0x7b1400000000
06 ( 96): mapped: 64K allocs: 128 frees: 0 inuse: 128 num_freed_chunks 554 avail: 682 rss: 4K releases: 0 last released: 0K region: 0x7b1800000000
49 ( 81920): mapped: 128K allocs: 1 frees: 0 inuse: 1 num_freed_chunks 0 avail: 1 rss: 4K releases: 0 last released: 0K region: 0x7bc400000000
Stats: LargeMmapAllocator: allocated 0 times, remains 0 (0 K) max 0 M; by size logs: |
This may or may not be considered a TSAN "bug" per se; regardless, it may be worth considering modifying the error message to give a hint to the user about a possible workaround. Instead of: FATAL: ThreadSanitizer: unexpected memory mapping 0x6004e9242000-0x6004e9243000 Print out something like: FATAL: ThreadSanitizer: unexpected memory mapping 0x6004e9242000-0x6004e9243000; as a potential workaround, you might consider disabling virtual memory address randomization and retrying. That's a trivial change I can make myself, and open a pull request. Before I do that, I would like to ask TSAN maintainer(s) whether or not they agree with modifying the error message. |
May be related to https://bugs.chromium.org/p/chromium/issues/detail?id=1496730 |
Many people might not be able to view that bug. In any case, I prototyped the re-exec approaches (disabling ASLR or re-exec'ing until the layout is compatible) and it will need some more work. The problem is that, with high-entropy ASLR, it occasionally segfaults before it can reach the "FATAL: ThreadSanitizer: unexpected memory mapping" error. Essentially, the checks in CheckAndProtect() are happening too late. @rohumm Btw, in many cases, lowering the ASLR entropy ( |
Specifically, the allocator is initialized before CheckAndProtect happens. This means if the randomized layout conflicts with the allocator's intended location, it will segfault. I'll upload a patch (to avoid the segfault and enable re-exec) for review in a day or two. |
TSan's shadow mappings only support 30-bits of ASLR entropy on x86, and it is not practical to support the maximum of 32-bits (due to pointer compression and the overhead of shadow mappings). Instead, this patch changes TSan to re-exec without ASLR if it encounters an incompatible memory layout, as suggested by Dmitry in google/sanitizers#1716. If ASLR is already disabled, it will abort. This patch involves a bit of refactoring, because the old code is: InitializePlatformEarly() InitializeAllocator() InitializePlatform(): CheckAndProtect() but it may already segfault during InitializeAllocator() if the memory layout is incompatible, before we get a chance to check in CheckAndProtect. This patch adds CheckAndProtect during InitializePlatformEarly(), before the allocator is initialized. Naturally, it is necessary to ensure that CheckAndProtect does *not* allow the heap regions to be occupied there, hence we generalize CheckAndProtect to optionally check the heap regions. We keep the original behavior of CheckAndProtect() in InitializePlatform() as a last line of defense. We need to careful not to prematurely abort if ASLR is disabled but TSan was going to re-exec for other reasons (e.g., unlimited stack size); we implement this by moving all the re-exec logic into ReExecIfNeeded().
…78351) TSan's shadow mappings only support 30-bits of ASLR entropy on x86 Linux, and it is not practical to support the maximum of 32-bits (due to pointer compression and the overhead of shadow mappings). Instead, this patch changes TSan to re-exec without ASLR if it encounters an incompatible memory layout, as suggested by Dmitry in google/sanitizers#1716. If ASLR is already disabled but the memory layout is still incompatible, it will abort. This patch involves a bit of refactoring, because the old code is: 1. InitializePlatformEarly() 2. InitializeAllocator() 3. InitializePlatform(): CheckAndProtect() but it may already segfault during InitializeAllocator() if the memory layout is incompatible, before we get a chance to check in CheckAndProtect(). This patch adds CheckAndProtect() during InitializePlatformEarly(), before the allocator is initialized. Naturally, it is necessary to ensure that CheckAndProtect() does *not* allow the heap regions to be occupied here, hence we generalize CheckAndProtect() to optionally check the heap regions. We keep the original behavior of CheckAndProtect() in InitializePlatform() as a last line of defense. We need to be careful not to prematurely abort if ASLR is disabled but TSan was going to re-exec for other reasons (e.g., unlimited stack size); we implement this by moving all the re-exec logic into ReExecIfNeeded().
I can confirm similar issues on a 6.7 Kernel (Arch Linux). clang version: ~/dev/playground % clang -v
clang version 16.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-pc-linux-gnu/13.2.1
Found candidate GCC installation: /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1
Selected GCC installation: /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
~/dev/playground % uname -a
Linux framefrog 6.7.0-arch3-1 #1 SMP PREEMPT_DYNAMIC Sat, 13 Jan 2024 14:37:14 +0000 x86_64 GNU/Linux
~/dev/playground % I can not claim to have a very high understanding here since I only recently started using tsan and diving into low-level and thus can barely use tsan to debug data-races. Because of that, if the solution turns out to change the error message, could said new error message include a link to some resource on the topic or more concrete suggestions for flags to use or the like? To make finding a work-around for the problem a smidge easier if you're entirely new on the topic. Mentioning e.g. earlier mentioned (On that note though, running |
A workaround that I've been using (note that this does decrease the security of your system, do not use on production systems) is to disable ASLR entirely:
|
To confirm the root cause, could you please paste the output from:
? The patch to automatically re-execute the TSan'ified process without randomization, if needed, landed in llvm/llvm-project@0784b1e A less risky alternative than disabling ASLR for all of Linux is to slightly reduce the amount of randomization:
|
I also experienced this problem under Linux 6.7.0 and can confirm that disabling ASLR for the process with
|
@jeremiahar Thanks for posting the output! That shows your Linux is maxing out the randomization (32 bits), which is incompatible with TSan (max supported is 30 bits on x86 Linux). |
I am not entirely sure if this suffices or if maybe the problem changed with Linux 6.7.0. ~/dev/playground % sudo sysctl vm.mmap_rnd_bits
vm.mmap_rnd_bits = 30
~/dev/playground % echo "int main(void){}" > hello.c
~/dev/playground % clang -o hello -fsanitize=thread -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer hello.c
~/dev/playground % ./hello
FATAL: ThreadSanitizer: unexpected memory mapping 0x7d9ee6e00000-0x7d9ee718e000 This does work if I go down to 28 (it is not possible to select a lower value for me).
As double blind, I jumped back to 29 and tried again, starts failing again: ~/dev/playground % sudo sysctl vm.mmap_rnd_bits=29
vm.mmap_rnd_bits = 29
~/dev/playground % clang -o hello -fsanitize=thread -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer hello.c
~/dev/playground % ./hello
FATAL: ThreadSanitizer: unexpected memory mapping 0x5709a7666000-0x5709a7686000 OS: Arch-Linux, CPU: AMD Ryzen 7 7840U |
What version is your clang? The TSan fix to support 30 bits of ASLR landed last November (llvm/llvm-project@7d039ef), so my initial guess is you have an older version of TSan that only supports 28 bits. The Linux 6.7 kernel has a default setting of 28 bits of entropy (https://elixir.bootlin.com/linux/v6.7/source/arch/x86/Kconfig). It's likely customized by your distro be higher (which is generally a good thing). |
Clang version is 16.0.6. ~/dev/playground % clang -v
clang version 16.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-pc-linux-gnu/13.2.1
Found candidate GCC installation: /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1
Selected GCC installation: /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/13.2.1
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
~/dev/playground % uname -a
Linux framefrog 6.7.0-arch3-1 #1 SMP PREEMPT_DYNAMIC Sat, 13 Jan 2024 14:37:14 +0000 x86_64 GNU/Linux
~/dev/playground % The package appears to have last been updated this january: https://archlinux.org/packages/extra/x86_64/clang/ |
@PhilippMDoerner Thank you for sharing all the output! clang 16.0.6 is based on June 13, 2023 source from upstream (https://github.com/llvm/llvm-project/releases/tag/llvmorg-16.0.6), so it doesn't have either of the recent TSan ASLR compatibility fixes (hence TSan is limited to 28 bits of ASLR). |
Thanks for the clarification! I was trying to figure out what the age of that clang-version was by skimming through docs and other txt files in the repo to little success, this helps immensely in that regard. Seems like it'll take a bit then until arch catches up to that particular fix. For now I think 28 shouldn't be too bad (?). Better than turning it off for sure I'd imagine. I'll note it down in a related stack-overflow question. |
On Arch with kernal 6.7.1-arch1-1, changing |
Same situation. Arch, kernel 6.7.6-arch1-1. Changing |
I'm on Ubuntu 23.10 and today I had a kernel update that led me here because it broke all my code too. I also had to drop down to 28 bits from 32 to make my test suite finally start passing again. Not only did tsan break but asan and ubsan also broke in mysterious ways as well. When the time comes, I guess I'll eventually increment it back up to 30 or 32 but until then, I can't complain too much. |
I reported this in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2056762 for Ubuntu's recent kernel update, which also bumped I observed the same as people above: ThreadSanitizer fails with "unexpected memory mapping" when |
To sum up: older versions of TSan are not compatible with very high ASLR entropy, which is the default in some recent Linux distros. The workaround is to reduce ASLR entropy: Newer versions of TSan (LLVM 18.1.0 onwards: llvm/llvm-project@0784b1e) will automatically re-exec without ASLR, if the layout is incompatible. Additionally, TSan in LLVM 18.1.0 onwards can support 30 bits of ASLR entropy without disabling ASLR (llvm/llvm-project@7d039ef). |
The ThreadSanitizer version currently available from Fedora 39 repositories is unable to cope with very high ASLR entropy, which is the default in some recent Linux distributions [1]. This causes all TSAN-enabled builds to fail on the affected systems with an error like: FATAL: ThreadSanitizer: unexpected memory mapping 0x7d00e0772000-0x7d00e0c00000 Work around the problem by reducing ASLR entropy for all TSAN-enabled builds until the problem is resolved upstream. [1] google/sanitizers#1716 (cherry picked from commit 05b09f2)
The ThreadSanitizer version currently available from Fedora 39 repositories is unable to cope with very high ASLR entropy, which is the default in some recent Linux distributions [1]. This causes all TSAN-enabled builds to fail on the affected systems with an error like: FATAL: ThreadSanitizer: unexpected memory mapping 0x7d00e0772000-0x7d00e0c00000 Work around the problem by reducing ASLR entropy for all TSAN-enabled builds until the problem is resolved upstream. [1] google/sanitizers#1716 (cherry picked from commit 05b09f2)
Since Linux Kernel 6.5 we are getting false positives from the ci, lower the ALSR entropy to disable ALSR, which works as a temporary workaround. google/sanitizers#1716 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2056762 closes official-stockfish#5115 No functional change
…it tests. There is a incompatibility between address sanitizer and recent ubuntu versions that causes an eternal loop of deadly signal when running unit tests in local environment. See this issue for more info google/sanitizers#1716
We were hitting tsan failures, see https://lab.llvm.org/buildbot/#/builders/131/builds/63556. Looks like these are due to google/sanitizers#1716 and bumping toolchain to llvm-18 fixes the issue.
We were hitting tsan failures, see https://lab.llvm.org/buildbot/#/builders/131/builds/63556. Looks like these are due to google/sanitizers#1716 and bumping toolchain to llvm-18 fixes the issue.
ThreadSanitizer doesn't cope well with newer kernel (>= 6.6.x) when ASLR is enabled: google/sanitizers#1716 This disables ASLR locally around the fedora-threadsan tasks. Signed-off-by: Daiki Ueno <ueno@gnu.org>
ThreadSanitizer doesn't cope well with newer kernel (>= 6.6.x) when ASLR is enabled: google/sanitizers#1716 This disables ASLR locally around the fedora-threadsan tasks. Signed-off-by: Daiki Ueno <ueno@gnu.org>
…ou can watch the detail in the website google/sanitizers#1716 Finish
C++ source file:
Build commands tried:
clang-16 -std=c++17 -fsanitize=thread -g -O0 main.cpp -o run.me
clang-18 -std=c++23 -fsanitize=thread -g -O0 main.cpp -o run.me
g++-9 -std=c++17 -fsanitize=thread -g -O0 main.cpp -o run.me
g++-11 -std=c++23 -fsanitize=thread -g -O0 main.cpp -o run.me
g++-13 -std=c++23 -fsanitize=thread -g -O0 main.cpp -o run.me
Error:
$ ./run.me FATAL: ThreadSanitizer: unexpected memory mapping 0x70a076c72000-0x70a077100000 $ TSAN_OPTIONS="verbosity=3" ./run.me FATAL: ThreadSanitizer: unexpected memory mapping 0x63068b744000-0x63068b773000
System info:
$ uname -a Linux pop-os 6.6.6-76060606-generic #202312111032~1702306143~22.04~d28ffec SMP PREEMPT_DYNAMIC Mon D x86_64 x86_64 x86_64 GNU/Linux
Misc.:
The text was updated successfully, but these errors were encountered: