Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRASH]使用tarhash 频繁给一个key设置带EX 的field。当filed过期,但该key占用的内存大小并没有改变,再次访问已过期的key,造成redis奔溃,重启 # #58

Open
ida1kd opened this issue Oct 8, 2023 · 6 comments

Comments

@ida1kd
Copy link

ida1kd commented Oct 8, 2023

Crash report

------ STACK TRACE ------
EIP:
/lib64/libc.so.6(gsignal+0x10f)[0x7f341940337f]

Backtrace:
/lib64/libpthread.so.0(+0x12c20)[0x7f34197a3c20]
/lib64/libc.so.6(gsignal+0x10f)[0x7f341940337f]
/lib64/libc.so.6(abort+0x127)[0x7f34193eddb5]
/lib64/libc.so.6(+0x21c89)[0x7f34193edc89]
/lib64/libc.so.6(+0x2fa76)[0x7f34193fba76]
/opt/tair/TairHash/lib/tairhash_module.so(+0x2585)[0x7f3418339585]
/opt/tair/TairHash/lib/tairhash_module.so(+0x1a7c)[0x7f3418338a7c]
/opt/tair/TairHash/lib/tairhash_module.so(+0x52bb)[0x7f341833c2bb]
/opt/tair/TairHash/lib/tairhash_module.so(+0x75b2)[0x7f341833e5b2]
./redis-server *:6379(RedisModuleCommandDispatcher+0x43)[0x4a79a3]
./redis-server *:6379(call+0x140)[0x55b1d0]
./redis-server *:6379(processCommand+0xa54)[0x55c214]
./redis-server *:6379(processInputBuffer+0xe9)[0x53de99]
./redis-server *:6379(readQueryFromClient+0x338)[0x53e3c8]
./redis-server *:6379[0x45e2ca]
./redis-server *:6379[0x460a6e]
./redis-server *:6379(aeMain+0x109)[0x564859]
./redis-server *:6379(main+0x39a)[0x450bea]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7f34193ef493]
./redis-server *:6379(_start+0x2e)[0x45131e]

------ REGISTERS ------
89654:M 08 Oct 2023 06:09:15.560 #
RAX:0000000000000000 RBX:0000000000000006
RCX:00007f341940337f RDX:0000000000000000
RDI:0000000000000002 RSI:00007ffe0b491470
RBP:00007f3419556698 RSP:00007ffe0b491470
R8 :0000000000000000 R9 :00007ffe0b491470
R10:0000000000000008 R11:0000000000000246
R12:00007f3418348fb0 R13:00007f3418349068
R14:00000000000000c3 R15:00007f3418c70420
RIP:00007f341940337f EFL:0000000000000246
CSGSFS:002b000000000033
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b49147f) -> 0000000000000000
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b49147e) -> 0000000000000000
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b49147d) -> 0000000000000000
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b49147c) -> 00000000011a39cc
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b49147b) -> 00000000011a38a0
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b49147a) -> 00000000011a39cc
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b491479) -> 00000000011a3913
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b491478) -> 00000000011a38a0
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b491477) -> 00000000011a38a0
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b491476) -> 00000000011a3905
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b491475) -> 00000000011a38a0
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b491474) -> 00000000fbad8000
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b491473) -> 00007f34194454a7
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b491472) -> 00007f3418c70420
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b491471) -> 00000000011a38a0
89654:M 08 Oct 2023 06:09:15.560 # (00007ffe0b491470) -> 0000000000000000

------ INFO OUTPUT ------

Server

redis_version:7.2.1
redis_git_sha1:cc244370
redis_git_dirty:0
redis_build_id:a595560e21015e74
redis_mode:standalone
os:Linux 4.18.0-348.7.1.el8_5.x86_64 x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:8.5.0
process_id:89654
process_supervised:no
run_id:6433d781d25f5024537e4ceb314f4f1d07174736
tcp_port:6379
server_time_usec:1696759755559414
uptime_in_seconds:895
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:2260939
executable:/ida/redis-stack-server-7.2.0-v2/bin/./redis-server
config_file:
io_threads_active:0
listener0:name=tcp,bind=,bind=-::,port=6379

Clients

connected_clients:1
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:20480
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
total_blocking_keys:0
total_blocking_keys_on_nokey:0

Memory

used_memory:1156664
used_memory_human:1.10M
used_memory_rss:13881344
used_memory_rss_human:13.24M
used_memory_peak:1351896
used_memory_peak_human:1.29M
used_memory_peak_perc:85.56%
used_memory_overhead:951656
used_memory_startup:928928
used_memory_dataset:205008
used_memory_dataset_perc:90.02%
allocator_allocated:1606704
allocator_active:1839104
allocator_resident:4907008
total_system_memory:16573603840
total_system_memory_human:15.44G
used_memory_lua:31744
used_memory_vm_eval:31744
used_memory_lua_human:31.00K
used_memory_scripts_eval:0
number_of_cached_scripts:0
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:64512
used_memory_vm_total_human:63.00K
used_memory_functions:184
used_memory_scripts:184
used_memory_scripts_human:184B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.14
allocator_frag_bytes:232400
allocator_rss_ratio:2.67
allocator_rss_bytes:3067904
rss_overhead_ratio:2.83
rss_overhead_bytes:8974336
mem_fragmentation_ratio:12.03
mem_fragmentation_bytes:12727216
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_total_replication_buffers:0
mem_clients_slaves:0
mem_clients_normal:22400
mem_cluster_links:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.3.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

Persistence

loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:50
rdb_bgsave_in_progress:0
rdb_last_save_time:1696758860
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:0
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:2
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:0
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

Stats

total_connections_received:3
total_commands_processed:89
instantaneous_ops_per_sec:0
total_net_input_bytes:5143
total_net_output_bytes:623519
total_net_repl_input_bytes:0
total_net_repl_output_bytes:0
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:8
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:1
dump_payload_sanitizations:0
total_reads_processed:93
total_writes_processed:96
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:3
reply_buffer_expands:0
eventloop_cycles:10443
eventloop_duration_sum:690677
eventloop_duration_cmd_sum:3189
instantaneous_eventloop_cycles_per_sec:10
instantaneous_eventloop_duration_usec:66
acl_access_denied_auth:0
acl_access_denied_cmd:0
acl_access_denied_key:0
acl_access_denied_channel:0

Replication

role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:cc698a37ce4d31daed6c88336761552ce1cbff62
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

CPU

used_cpu_sys:0.129024
used_cpu_user:0.630906
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:0.134329
used_cpu_user_main_thread:0.624881

Modules

module:name=tairhash,ver=1,api=1,filters=0,usedby=[],using=[],options=[]

Commandstats

cmdstat_exhget:calls=3,usec=42,usec_per_call=14.00,rejected_calls=0,failed_calls=0
cmdstat_exhset:calls=23,usec=586,usec_per_call=25.48,rejected_calls=0,failed_calls=0
cmdstat_exhgetall:calls=13,usec=147,usec_per_call=11.31,rejected_calls=0,failed_calls=0
cmdstat_memory|usage:calls=46,usec=307,usec_per_call=6.67,rejected_calls=0,failed_calls=0
cmdstat_command|docs:calls=3,usec=2102,usec_per_call=700.67,rejected_calls=0,failed_calls=0
cmdstat_keys:calls=1,usec=5,usec_per_call=5.00,rejected_calls=0,failed_calls=0

Errorstats

errorstat_ERR:count=1

Latencystats

latency_percentiles_usec_exhget:p50=8.031,p99=27.007,p99.9=27.007
latency_percentiles_usec_exhset:p50=16.063,p99=87.039,p99.9=87.039
latency_percentiles_usec_exhgetall:p50=12.031,p99=14.015,p99.9=14.015
latency_percentiles_usec_memory|usage:p50=6.015,p99=27.007,p99.9=27.007
latency_percentiles_usec_command|docs:p50=704.511,p99=811.007,p99.9=811.007
latency_percentiles_usec_keys:p50=5.023,p99=5.023,p99.9=5.023

Cluster

cluster_enabled:0

Keyspace

db0:keys=2,expires=0,avg_ttl=0

------ CLIENT LIST OUTPUT ------
id=8 addr=127.0.0.1:56550 laddr=127.0.0.1:6379 fd=8 name= age=74 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=78 qbuf-free=20396 argv-mem=55 multi-mem=0 rbs=1024 r=6 obl=0 oll=0 omem=0 tot-mem=22479 events=r cmd=exhget user=default redir=-1 resp=2 lib-name= lib-ver=

------ CURRENT CLIENT INFO ------
id=8 addr=127.0.0.1:56550 laddr=127.0.0.1:6379 fd=8 name= age=74 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=78 qbuf-free=20396 argv-mem=55 multi-mem=0 rbs=1024 r=6 obl=0 oll=0 omem=0 tot-mem=22479 events=r cmd=exhget user=default redir=-1 resp=2 lib-name= lib-ver=
argc: '3'
argv[0]: '"exhget"'
argv[1]: '"us:123456"'
argv[2]: '"session:00022580259e42a699bc241c8e537803"'
89654:M 08 Oct 2023 06:09:15.560 # key 'us:123456' found in DB containing the following object:
89654:M 08 Oct 2023 06:09:15.560 # Object type: 5
89654:M 08 Oct 2023 06:09:15.560 # Object encoding: 0
89654:M 08 Oct 2023 06:09:15.560 # Object refcount: 1

------ EXECUTING CLIENT INFO ------
id=8 addr=127.0.0.1:56550 laddr=127.0.0.1:6379 fd=8 name= age=74 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=78 qbuf-free=20396 argv-mem=55 multi-mem=0 rbs=1024 r=6 obl=0 oll=0 omem=0 tot-mem=22479 events=r cmd=exhget user=default redir=-1 resp=2 lib-name= lib-ver=
argc: '3'
argv[0]: '"exhget"'
argv[1]: '"us:123456"'
argv[2]: '"session:00022580259e42a699bc241c8e537803"'
89654:M 08 Oct 2023 06:09:15.560 # key 'us:123456' found in DB containing the following object:
89654:M 08 Oct 2023 06:09:15.560 # Object type: 5
89654:M 08 Oct 2023 06:09:15.560 # Object encoding: 0
89654:M 08 Oct 2023 06:09:15.560 # Object refcount: 1

------ MODULES INFO OUTPUT ------

tairhash_Statistics

tairhash_active_expire_enable:1
tairhash_active_expire_period:1000
tairhash_active_expire_keys_per_loop:1000
tairhash_active_expire_dbs_per_loop:16
tairhash_active_expire_last_time_msec:0
tairhash_active_expire_max_time_msec:1
tairhash_active_expire_avg_time_msec:0
tairhash_passive_expire_keys_per_loop:3

tairhash_ActiveExpiredFields

tairhash_db0:42

tairhash_PassiveExpiredFields

tairhash_db0:0

------ CONFIG DEBUG OUTPUT ------
io-threads 1
repl-diskless-load disabled
activedefrag no
lazyfree-lazy-user-flush no
lazyfree-lazy-eviction no
repl-diskless-sync yes
client-query-buffer-limit 1gb
lazyfree-lazy-user-del no
lazyfree-lazy-expire no
slave-read-only yes
proto-max-bulk-len 512mb
replica-read-only yes
lazyfree-lazy-server-del no
list-compress-depth 0
io-threads-do-reads no
sanitize-dump-payload no

------ FAST MEMORY TEST ------
89654:M 08 Oct 2023 06:09:15.560 # Bio worker thread #0 terminated
89654:M 08 Oct 2023 06:09:15.561 # Bio worker thread #1 terminated
89654:M 08 Oct 2023 06:09:15.561 # Bio worker thread #2 terminated
*** Preparing to test memory region 8eb000 (2273280 bytes)
*** Preparing to test memory region 1171000 (270336 bytes)
*** Preparing to test memory region 7f3410000000 (135168 bytes)
*** Preparing to test memory region 7f3415367000 (2621440 bytes)
*** Preparing to test memory region 7f3415800000 (8388608 bytes)
*** Preparing to test memory region 7f3416000000 (2097152 bytes)
*** Preparing to test memory region 7f3416334000 (8388608 bytes)
*** Preparing to test memory region 7f3416b35000 (8388608 bytes)
*** Preparing to test memory region 7f3417336000 (8388608 bytes)
*** Preparing to test memory region 7f3417b37000 (8388608 bytes)
*** Preparing to test memory region 7f3418551000 (8192 bytes)
*** Preparing to test memory region 7f3418800000 (8388608 bytes)
*** Preparing to test memory region 7f34193cb000 (4096 bytes)
*** Preparing to test memory region 7f341978d000 (16384 bytes)
*** Preparing to test memory region 7f34199ad000 (16384 bytes)
*** Preparing to test memory region 7f3419e95000 (20480 bytes)
*** Preparing to test memory region 7f341a12d000 (4096 bytes)
*** Preparing to test memory region 7f341aad5000 (32768 bytes)
*** Preparing to test memory region 7f341aae0000 (4096 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

------ DUMPING CODE AROUND EIP ------
Symbol: gsignal (base: 0x7f3419403270)
Module: /lib64/libc.so.6 (base 0x7f34193cc000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=0x7f3419403270 -D -b binary -m i386:x86-64 /tmp/dump.bin

89654:M 08 Oct 2023 06:09:15.731 # dump of function (hexdump of 399 bytes):
f30f1efa5349c7c0ffffffff89fb41ba0800000031ff4881ec1001000064488b042528000000488984240801000031c04989e148b8ffffff7ffeffffff4c8984248800000048898424800000004c89ca488db424800000b80e0000004c898424900000004c898424980000004c898424a00000004c898424a80000004c898424b00000004c898424b80000004c898424c00000004c898424c80000004c898424d00000004c898424d8000000898424e00000004c898424e80000004c898424f00000004c898424f80000000f05b92700000089c80f054889c7b8ba0000000f0589c689dab8ea0000000f05483d00f0ffff773b4189c041ba0800000031d24c89cebf000000b80e0000000f05488b8c24080100006448330c25280000004489c075194881c4100100005bc36690488b15c17a3800f7d8648902ebbae8dd5f0d00662e0f1f8400000000000f1f00f30f1efa85ff7808f7dfe902000090488b05917a380064c70016000000b8ffffffffc3662e0f1f84000000000066909066662e0f1f8400000000000f1f40
Function at 0x7f34194d9390 is __stack_chk_fail

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

   Please report the crash by opening an issue on github:

       http://github.com/redis/redis/issues

If a Redis module was involved, please open in the module's repo instead.

Suspect RAM error? Use redis-server --test-memory to verify it.

Some other issues could be detected by redis-server --check-system
已放弃 (核心已转储)

Additional information

  1. OS distribution and version:centos8
  2. Steps to reproduce (if any)
    首先连续给 us:123456 这个key添加field,并这是过期时间
    exhset us:123456 session:00022580259e42a699bc241c8e537800 "" EX 120
    exhset us:123456 session:00022580259e42a699bc241c8e537801 "" EX 120
    exhset us:123456 session:00022580259e42a699bc241c8e537802 "" EX 120
    exhset us:123456 session:00022580259e42a699bc241c8e537803 "" EX 120
    exhset us:123456 session:00022580259e42a699bc241c8e537804 "" EX 120
    exhset us:123456 session:00022580259e42a699bc241c8e537805 "" EX 120

一共添加 25个不一样的field,并且设置EX 为 120秒
再添加一个超时时间久一点的field
exhset us:123456 session:00022580259e42a699bc241c8e537805 "" EX 12000
产生问题1:
当field都过期后,key中只有一条数据,占用内存依旧很大。内存空间没有被释放
127.0.0.1:6379> memory usage us:123456
(integer) 920
产生问题2:
当再次 exhget us:123456 session:00022580259e42a699bc241c8e537803 其中一个field 或者添加一个于已过期filed一样的field时候,精经常性触发 redis重启!!!!!
期待得到回复,尤其是内存空间不被释放问题

@ida1kd
Copy link
Author

ida1kd commented Oct 8, 2023

redis-server: /opt/tair/TairHash/src/slabapi.c:195: slab_expireDelete: Assertion `find_node->slab != NULL' failed.

@chenyang8094
Copy link
Collaborator

redis-server: /opt/tair/TairHash/src/slabapi.c:195: slab_expireDelete: Assertion `find_node->slab != NULL' failed.

请问你测试用的是哪个版本?

@ida1kd
Copy link
Author

ida1kd commented Nov 14, 2023

redis-server:/opt/tair/TairHash/src/slabapi.c:195:slab_expireDelete:断言“find_node->slab!= NULL”失败。

请问你测试用的是哪个版本?

v1.0.0

@chenyang8094
Copy link
Collaborator

redis-server:/opt/tair/TairHash/src/slabapi.c:195:slab_expireDelete:断言“find_node->slab!= NULL”失败。

请问你测试用的是哪个版本?

v1.0.0

收到,我们已经定位到问题原因,这两天会提交Bugfix,感谢bug report。

@cyjaysong
Copy link

修复了嘛修复了嘛

@chenyang8094
Copy link
Collaborator

修复了嘛修复了嘛

你好,请先改用其他模式,slab模式还存在其他稳定性问题,等过几天空闲了我会紧急修复下,非常抱歉。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants