Fix pthread_mutex_trylock deadlock in jemalloc #2727
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: resolve #2726
Problem Summary:
#2692 未使用
__dl_sym
的原因是,UT无法运行,报错信息:symbol lookup error: ./libbrpc.so: undefined symbol: pthread_mutex_trylock
。相关issue:#2266 #1086 。报错原因总结:
libpthread.so
先于libbrpc.dbg.so
加载,导致使用__dl_sym RTLD_NEXT
在后续加载的动态库中找不到pthread_mutex_trylock
符号。解决方法有两个:
libbrpc.dbg.so
先于libpthread.so
加载。具体分析:
man文档提到
RTLD_NEXT
的作用:在本场景下,大致意思是在从加载顺序在
libbrpc.dbg.so
之后的动态库中查找pthread_mutex_*
符号。那么,libbrpc.dbg.so
要先于libpthread.so
加载,才能找到pthread_mutex_*
系列符号。在master分支下编译出brpc_channel_unittest程序用作调试。为了更好地展示,会对输出进行适当的处理(过滤、删减)。
通过
LD_DEBUG=libs
查看动态库加载顺序,发现libpthread.so
先于libbrpc.dbg.so
加载了。同时,发现了使用
dlsym
也有同样的报错,但是dlsym
不会让进程退出,而是通过dlerror
返回错误信息(#2726 的死锁问题是因为这一块申请内存导致的)。所以,此时
sys_pthread_mutex_trylock
是NULL
。UT之所以没有crash,应该是所有UT以及依赖的库都没用pthread_mutex_trylock
。另一方面,没有
pthread_mutex_lock
和pthread_mutex_unlock
相关的报错,换而言之,它们的符号是能被找到的。那么,这两个符号来自于哪里呢?增加一行代码,方便识别出
pthread_mutex_lock
和pthread_mutex_unlock
符号的相关绑定信息。通过
LD_DEBUG=bindings,libs
找到了,pthread_mutex_lock
和pthread_mutex_unlock
符号来自于libc.so.6
(两个pthread_mutex_trylock
报错之间的输出)。在
libc.so.6
搜索pthread_mutex_*
相关符号,确实没有pthread_mutex_trylock
的符号。nm -D /usr/lib/x86_64-linux-gnu/libc.so.6 | grep pthread_mutex 0000000000094480 T pthread_mutex_destroy 00000000000944b0 T pthread_mutex_init 00000000000944e0 T pthread_mutex_lock 0000000000094510 T pthread_mutex_unlock
libc.so
中的pthread_mutex_*
相关函数应该是stub function
,参考[1] [[2] [3]。在这个场景下,即使
pthread_mutex_lock
和pthread_mutex_unlock
使用了错误的函数,pthread_mutex_trylock
是NULL
,也不会影响进程运行。因为libpthread.so
先加载了,这时候进程使用的pthread_mutex_*
符号都来自于libpthread.so
,即libbrpc.dbg.so
的hook失效了。What is changed and the side effects?
Changed:
__dl_sym
加载pthread_mutex_try
,规避malloc
库死锁问题。使用时需要满足以下其中一点:libbrpc.dbg.so
先于libpthread.so
加载。(UT使用了这个方法)NO_PTHREAD_MUTEX_HOOK
宏关闭pthread_mutex_*
相关的hook。关闭后,只是contention profiler采集不到pthread_mutex
的竞争,在可接受范围内。Side effects:
Performance effects(性能影响):
Breaking backward compatibility(向后兼容性):
Check List: