-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HANG in elfutils libdw in looking up line numbers with drsyms #6611
Comments
Debug build of DR works fine. DrMemory doesn't seem to be running release-build tests pre-commit? What about DR which does do release-build tests: did drcov tests all pass? What is different here? |
I spent some time digging further into this. There is some weird pthread lock behavior. __readers == 0xb has 2 set for this single-threaded app! It is set to 0x8 and then 0xb here:
That's that same Dwarf_Abbrev_Hash_find calling:
Which apparently will unlock either a write or read lock. But it seems |
A higher-level approach: it reproduces on DR on the I tracked it down by tweaking debug build to look like release. When I disabled DEBUG_MEMORY and HEAP_ACCOUNTING, debug build then reproduced the hang, even at Narrowing further: it's the memset for the mmap in privload_tls_init(). Narrowing within that 2-page region: it's offset 0x19d0, which turns into It seems that if the tid field is zero, the pthread lock checks for whether a lock is owned incorrectly think the lock is indeed owned since the owner field of the lock is zero and that matches the (invalid) tid of zero. This explains the weird lock behavior in the callstack above at the hang point. |
I thought this may be related to #5437 and the libc/ld GLRO vars: but the variables initialized there are not in the TLS mmap and are unrelated. This is related to that issue in the sense that it's coming from the merging of pthreads into libc and having too tight of a coupling with ld.so with undocumented initialization between them, making if very hard to replace ld.so as DR is doing. For a short term solution, my plan is to locate this field ( |
Fixes a hang on glibc 2.37 by initializing the tid pthread TLS field. Its offset is located by decoding an exported function known to reference it in a new routine privload_set_pthread_tls_fields(). Tested on a glibc 2.37 machine where without this fix the client.drcallstack test hangs in release build. Also tested on a Dr. Memory 2.6.19737 build pointing at a release build DR with this fix and confirmed it fixes the hang there. Fixes #6611
Fixes a hang on glibc 2.37 by initializing the tid pthread TLS field. Its offset is located by decoding an exported function known to reference it in a new routine privload_set_pthread_tls_fields(). Only x86 is supported with this fix as no aarch64 machine with the required glibc is available for developing and testing the decode fix. A debug-build warning is printed for glibc 3.37+ on non-x86. Tested on a glibc 2.37 machine where without this fix the client.drcallstack test hangs in release build. Also tested on a Dr. Memory 2.6.19737 build pointing at a release build DR with this fix and confirmed it fixes the hang there. Fixes #6611
Updates DR to b42b82b1d to fix a problem with DR's private loader with glibc 2.37 and the new elfutils libdw when addresses are looked up. Issue: DynamoRIO/dynamorio#6611
Updates DR to b42b82b1d to fix a problem with DR's private loader with glibc 2.37 and the new elfutils libdw when addresses are looked up. Issue: DynamoRIO/dynamorio#6611
Noticed this in Dr. Memory where with the update to use elfutils libdw, release build seem to hang when looking up line numbers:
Let it run w/ breakpoints in drsyms_dw.c -- never returns out of libdw.
Seems to be stuck here inside Dwarf_Abbrev_Hash_find at
third_party/elfutils/lib/dynamicsizehash_concurrent.c:476:
htab is maybe not initialized properly?
The text was updated successfully, but these errors were encountered: