-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault when vm.overcommit_memory
is set to 2
#138
Comments
Similar results with "manual" compressor, trying to compress Matrix HQ room. # RUST_BACKTRACE=1 RUST_LOG=debug LD_PRELOAD=/usr/lib64/libjemalloc.so.2 ./synapse_compress_state -p "user=postgres dbname=matrix host=/run/postgresql" -r '!OGEhHVWSdvArJzumhm:matrix.org' -o out.sql -t -n 500 -b 170833
[2024-04-09T23:43:16Z INFO synapse_compress_state] Fetching state from DB for room '!OGEhHVWSdvArJzumhm:matrix.org'...
[2024-04-09T23:43:16Z DEBUG tokio_postgres::prepare] preparing query s0: SELECT id FROM (SELECT id FROM state_groups WHERE room_id = $1 AND id > $2 ORDER BY id ASC LIMIT $3) AS ids ORDER BY ids.id DESC LIMIT 1
[2024-04-09T23:43:16Z DEBUG tokio_postgres::query] executing statement s0 with parameters: ["!OGEhHVWSdvArJzumhm:matrix.org", Some(170833), Some(500)]
[2024-04-09T23:43:16Z DEBUG tokio_postgres::prepare] preparing query s1:
SELECT m.id, prev_state_group, type, state_key, s.event_id
FROM state_groups AS m
LEFT JOIN state_groups_state AS s ON (m.id = s.state_group)
LEFT JOIN state_group_edges AS e ON (m.id = e.state_group)
WHERE m.room_id = $1 AND m.id <= $2
AND m.id > $3
[2024-04-09T23:43:16Z DEBUG tokio_postgres::query] executing statement s1 with parameters: ["!OGEhHVWSdvArJzumhm:matrix.org", 173804, 170833]
[2m] 14444321 rows retrieved[2024-04-09T23:45:33Z DEBUG synapse_compress_state::database] Got initial state from database. Checking for any missing s
tate groups...
[2024-04-09T23:45:33Z INFO synapse_compress_state] Fetched state groups up to 173804
[2024-04-09T23:45:33Z INFO synapse_compress_state] Number of state groups: 500
[2024-04-09T23:45:33Z INFO synapse_compress_state] Number of rows in current table: 14444035
[2024-04-09T23:45:33Z INFO synapse_compress_state] Compressing state...
[00:01:32] ████████████████████ 500/500 state groups
[2024-04-09T23:47:06Z INFO synapse_compress_state] Number of rows after compression: 2943697 (20.38%)
[2024-04-09T23:47:06Z INFO synapse_compress_state] Compression Statistics:
[2024-04-09T23:47:06Z INFO synapse_compress_state]Number of forced resets due to lacking prev: 29
[2024-04-09T23:47:06Z INFO synapse_compress_state]Number of compressed rows caused by the above: 2680484
[2024-04-09T23:47:06Z INFO synapse_compress_state]Number of state groups changed: 161
[2024-04-09T23:47:06Z INFO synapse_compress_state] Checking that state maps match...
[00:00:00] ░░░░░░░░░░░░░░░░░░░░ 0/500 state groups
Segmentation fault (core dumped) Coredump # gdb /opt/rust-synapse-compress-state/target/debug/synapse_compress_state --core /root/core.synapse_compres.0.90ce0546c6a44f6ea07a0538e09d8004.1952675.1712706710000000
GNU gdb (GDB) Red Hat Enterprise Linux 10.2-11.1.el9_3
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /opt/rust-synapse-compress-state/target/debug/synapse_compress_state...
[New LWP 1952986]
[New LWP 1952675]
[New LWP 1952977]
[New LWP 1952976]
[New LWP 1952978]
[New LWP 1952979]
[New LWP 1952980]
[New LWP 1952981]
[New LWP 1952985]
[New LWP 1952975]
[New LWP 1952982]
[New LWP 1952983]
[New LWP 1952984]
[New LWP 1952987]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
--Type <RET> for more, q to quit, c to continue without paging--
Core was generated by `./synapse_compress_state -p user=postgres dbname=matrix host=/run/postgresql -r'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 ___pthread_mutex_trylock (mutex=mutex@entry=0x28e8) at pthread_mutex_trylock.c:34
34 switch (__builtin_expect (PTHREAD_MUTEX_TYPE_ELISION (mutex),
[Current thread is 1 (Thread 0x7f8317ff4640 (LWP 1952986))]
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /opt/rust-synapse-compress-state/target/debug/synapse_compress_state.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) bt
#0 ___pthread_mutex_trylock (mutex=mutex@entry=0x28e8) at pthread_mutex_trylock.c:34
#1 0x00007f831a080a08 in malloc_mutex_trylock_final (mutex=0x28a8) at include/jemalloc/internal/mutex.h:157
#2 malloc_mutex_lock (mutex=0x28a8, tsdn=0x7f8317ff2f88) at include/jemalloc/internal/mutex.h:216
#3 je_tcache_arena_associate (tsdn=tsdn@entry=0x7f8317ff2f88, tcache_slow=tcache_slow@entry=0x7f8317ff3088, tcache=tcache@entry=0x7f8317ff32e0, arena=arena@entry=0x0) at src/tcache.c:588
#4 0x00007f831a08442b in arena_choose_impl.constprop.1 (tsd=0x7f8317ff2f88, arena=<optimized out>, internal=false) at include/jemalloc/internal/jemalloc_internal_inlines_b.h:60
#5 0x00007f831a01d397 in arena_choose (arena=0x0, tsd=0x7f8317ff2f88) at include/jemalloc/internal/jemalloc_internal_inlines_b.h:88
#6 tcache_alloc_small (slow_path=<optimized out>, zero=true, binind=2, size=32, tcache=0x7f8317ff32e0, arena=0x0, tsd=0x7f8317ff2f88) at include/jemalloc/internal/tcache_inlines.h:56
#7 arena_malloc (slow_path=<optimized out>, tcache=0x7f8317ff32e0, zero=true, ind=2, size=32, arena=0x0, tsdn=0x7f8317ff2f88) at include/jemalloc/internal/arena_inlines_b.h:151
#8 iallocztm (slow_path=<optimized out>, arena=0x0, is_internal=false, tcache=0x7f8317ff32e0, zero=true, ind=2, size=32, tsdn=0x7f8317ff2f88) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:55
#9 imalloc_no_sample (ind=2, usize=32, size=32, tsd=0x7f8317ff2f88, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2398
#10 imalloc_body (tsd=0x7f8317ff2f88, dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2573
#11 imalloc (dopts=<optimized out>, sopts=<optimized out>) at src/jemalloc.c:2687
#12 calloc (num=num@entry=1, size=size@entry=32) at src/jemalloc.c:2852
#13 0x00007f8319657b43 in __cxa_thread_atexit_impl (func=0xff8c83398676680f, obj=0x7f8317ff4460, dso_symbol=0x5620ea26d730 <_rust_extern_with_linkage___dso_handle>) at cxa_thread_atexit_impl.c:107
#14 0x00005620ea009899 in std::sys::unix::stack_overflow::imp::signal_handler ()
#15 <signal handler called>
#16 ___pthread_mutex_trylock (mutex=mutex@entry=0x28e8) at pthread_mutex_trylock.c:34
#17 0x00007f831a080a08 in malloc_mutex_trylock_final (mutex=0x28a8) at include/jemalloc/internal/mutex.h:157
#18 malloc_mutex_lock (mutex=0x28a8, tsdn=0x7f8317ff2f88) at include/jemalloc/internal/mutex.h:216
#19 je_tcache_arena_associate (tsdn=tsdn@entry=0x7f8317ff2f88, tcache_slow=tcache_slow@entry=0x7f8317ff3088, tcache=tcache@entry=0x7f8317ff32e0, arena=arena@entry=0x0) at src/tcache.c:588
#20 0x00007f831a0859a8 in arena_choose_impl (arena=<optimized out>, internal=false, tsd=0x7f8317ff2f88) at include/jemalloc/internal/jemalloc_internal_inlines_b.h:60
#21 arena_choose_impl (arena=0x0, internal=false, tsd=0x7f8317ff2f88) at include/jemalloc/internal/jemalloc_internal_inlines_b.h:32
#22 arena_choose (arena=0x0, tsd=0x7f8317ff2f88) at include/jemalloc/internal/jemalloc_internal_inlines_b.h:88
#23 je_tsd_tcache_data_init.isra.0 (tsd=0x7f8317ff2f88) at src/tcache.c:740
#24 0x00007f831a085df9 in je_tsd_tcache_enabled_data_init (tsd=<optimized out>) at src/tcache.c:644
#25 0x00007f831a085e8c in je_tsd_fetch_slow.constprop.0 (minimal=minimal@entry=false, tsd=<optimized out>) at src/tsd.c:311
#26 0x00007f831a024445 in tsd_fetch_impl (minimal=false, init=true) at include/jemalloc/internal/tsd.h:422
#27 tsd_fetch () at include/jemalloc/internal/tsd.h:448
#28 imalloc (dopts=<synthetic pointer>, sopts=<synthetic pointer>) at src/jemalloc.c:2681
#29 realloc (ptr=ptr@entry=0x0, size=size@entry=32) at src/jemalloc.c:3653
#30 0x00007f83196a0bea in __pthread_getattr_np (thread_id=140201020048960, attr=0x7f8317ff23d0) at pthread_getattr_np.c:181
#31 0x00005620ea00a2a1 in std::sys::unix::thread::guard::current ()
--Type <RET> for more, q to quit, c to continue without paging--
#32 0x00005620e9e56523 in std::thread::{impl#0}::spawn_unchecked_::{closure#1}<rayon_core::registry::{impl#2}::spawn::{closure_env#0}, ()> () at /builddir/build/BUILD/rustc-1.71.1-src/library/std/src/thread/mod.rs:527
#33 0x00005620e9e2316f in core::ops::function::FnOnce::call_once<std::thread::{impl#0}::spawn_unchecked_::{closure_env#1}<rayon_core::registry::{impl#2}::spawn::{closure_env#0}, ()>, ()> ()
at /builddir/build/BUILD/rustc-1.71.1-src/library/core/src/ops/function.rs:250
#34 0x00005620ea00a095 in std::sys::unix::thread::Thread::new::thread_start ()
#35 0x00007f831969f802 in start_thread (arg=<optimized out>) at pthread_create.c:443
#36 0x00007f831963f450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 |
Interesting. After I
...segfault no longer happens. The behavior seems weird regardless, but I've got nothing more to add to this except that the older version of didn't fix segfaults before I did the other steps. If it's impossible to investigate and no one else is able to reproduce this, I think this issue can be closed. |
From the stack trace, sounds like it could be a problem with jemallocator. So using a different version is probably a reasonable workaround. What do you mean by 'mock-compiled' one? |
I can't believe I forgot to post about the actual reason this kept happening. It was very likely this sysctl parameter: I suppose the compressor crashing with the option enabled isn't intended behavior but I don't know if it's worth the trouble of fixing either.
mock is a handy tool for creating RPMs. Instead of working with the |
vm.overcommit_memory
is set to 2
@villepeh can this issue now be closed? |
Yup. Apologies for the inconvenience! |
I used Synapse Admin API to purge some rooms that no longer had local users. After that I started seeing panic messages like the ones in #79.
I tried deleting the compressor entries from the database but now I'm getting segfaults.
I cloned the repository again and rebuilt auto compressor with
cargo build
but the result is the same. Running the command withsudo -u postgres
orroot
makes no difference.I tried to get some debug info with GDB:
The text was updated successfully, but these errors were encountered: