Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] add --enable-prof-libunwind to jemalloc build #48671

Merged
merged 1 commit into from
Nov 13, 2024

Conversation

rynewang
Copy link
Contributor

@rynewang rynewang commented Nov 9, 2024

This fixes a jemalloc profiling deadlock, as hit by clickhouse ClickHouse/ClickHouse#66346

Stacktrace:

Thread 36 (Thread 0x7f77d3798700 (LWP 4673)):
#0  __lll_lock_wait (futex=futex@entry=0x7f7ae01333a0 <object_mutex>, private=0) at lowlevellock.c:52
#1  0x00007f7ae06a10a3 in __GI___pthread_mutex_lock (mutex=0x7f7ae01333a0 <object_mutex>) at ../nptl/pthread_mutex_lock.c:80
#2  0x00007f7ae012dd88 in __gthread_mutex_lock (__mutex=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at ./gthr-default.h:749
#3  _Unwind_Find_registered_FDE (bases=0x7f77d3792058, bases@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pc=0x7f7ae012be5b <_Unwind_Backtrace+59>, pc@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libgcc/unwind-dw2-fde.c:1049
#4  _Unwind_Find_FDE (pc=0x7f7ae012be5b <_Unwind_Backtrace+59>, bases=bases@entry=0x7f77d3792058) at /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libgcc/unwind-dw2-fde-dip.c:459
#5  0x00007f7ae0129e08 in uw_frame_state_for (context=0x7f77d3791fb0, fs=0x7f77d3791e00) at /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libgcc/unwind-dw2.c:1263
#6  0x00007f7ae012b060 in uw_init_context_1 (context=0x7f77d3791fb0, outer_cfa=0x7f77d3792260, outer_ra=0x7f7ae0739446) at /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libgcc/unwind-dw2.c:1592
#7  0x00007f7ae012be5c in _Unwind_Backtrace (trace=0x7f7ae072ed70, trace_argument=0x7f77d3792260) at /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libgcc/unwind.inc:295
#8  0x00007f7ae0739446 in ?? () from /usr/lib/x86_64-linux-gnu/libjemalloc.so.2
#9  0x00007f7ae06d6045 in ?? () from /usr/lib/x86_64-linux-gnu/libjemalloc.so.2
#10 0x00007f7ae012d6be in start_fde_sort (count=1284, accu=0x7f77d3792360) at /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libgcc/unwind-dw2-fde.c:443
#11 init_object (ob=0x7f79a9bd5fe0) at /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libgcc/unwind-dw2-fde.c:802
#12 search_object (ob=0x7f79a9bd5fe0, pc=<optimized out>) at /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libgcc/unwind-dw2-fde.c:992
#13 0x00007f7ae012de76 in _Unwind_Find_registered_FDE (bases=0x7f77d37926d8, bases@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pc=0x7f7ae012be5b <_Unwind_Backtrace+59>, pc@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libgcc/unwind-dw2-fde.c:1069
#14 _Unwind_Find_FDE (pc=0x7f7ae012be5b <_Unwind_Backtrace+59>, bases=bases@entry=0x7f77d37926d8) at /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libgcc/unwind-dw2-fde-dip.c:459
#15 0x00007f7ae0129e08 in uw_frame_state_for (context=0x7f77d3792630, fs=0x7f77d3792480) at /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libgcc/unwind-dw2.c:1263
#16 0x00007f7ae012b060 in uw_init_context_1 (context=0x7f77d3792630, outer_cfa=0x7f77d37928e0, outer_ra=0x7f7ae0739446) at --Type <RET> for more, q to quit, c to continue without paging--
/opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libgcc/unwind-dw2.c:1592
#17 0x00007f7ae012be5c in _Unwind_Backtrace (trace=0x7f7ae072ed70, trace_argument=0x7f77d37928e0) at /opt/conda/conda-bld/gcc-compiler_1654084175708/work/gcc/libgcc/unwind.inc:295
#18 0x00007f7ae0739446 in ?? () from /usr/lib/x86_64-linux-gnu/libjemalloc.so.2
#19 0x00007f7ae06d6045 in ?? () from /usr/lib/x86_64-linux-gnu/libjemalloc.so.2
#20 0x00000000004d6e91 in _PyMem_RawMalloc (size=<optimized out>, ctx=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /usr/local/src/conda/python-3.10.14/Objects/obmalloc.c:91

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@rynewang
Copy link
Contributor Author

This is what I found when debugging with jemalloc profiling. Our bazel config for jemalloc needs to update with a flag otherwise it deadlocks.

@rynewang rynewang enabled auto-merge (squash) November 13, 2024 01:09
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Nov 13, 2024
@rynewang rynewang merged commit df15c58 into ray-project:master Nov 13, 2024
6 checks passed
JP-sDEV pushed a commit to JP-sDEV/ray that referenced this pull request Nov 14, 2024
This fixes a jemalloc profiling deadlock, as hit by clickhouse
ClickHouse/ClickHouse#66346

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this pull request Nov 15, 2024
This fixes a jemalloc profiling deadlock, as hit by clickhouse
ClickHouse/ClickHouse#66346

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants