Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

fix(meta): bind task_tracker for each task in meta server #344

Merged
merged 56 commits into from
Dec 3, 2019

Conversation

Smityz
Copy link
Contributor

@Smityz Smityz commented Nov 29, 2019

This bug is found in the unit test in #342.
In the unit test for HTTP interface of querying coldback policy, I send a fake_rpc to add a policy here.
After finishing my test,

[----------] 3 tests from meta_http_service_test
[ RUN      ] meta_http_service_test.get_app_from_primary
[       OK ] meta_http_service_test.get_app_from_primary (841 ms)
[ RUN      ] meta_http_service_test.get_app_envs
[       OK ] meta_http_service_test.get_app_envs (833 ms)
[ RUN      ] meta_http_service_test.get_backup_policy
[       OK ] meta_http_service_test.get_backup_policy (2146 ms)
[----------] 3 tests from meta_http_service_test (3820 ms total)

[----------] 3 tests from meta_split_service_test
[ RUN      ] meta_split_service_test.start_split_with_not_existed_app
got signal id: 11
Segmentation fault (core dumped)

a core dumped happened.
Here is the info of the core file:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./dsn.meta.test'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000003254534500 in ?? ()
[Current thread is 1 (Thread 0x7f66059c4700 (LWP 8464))]
(gdb) where
#0  0x0000003254534500 in ?? ()
#1  0x000056338b47c42d in dsn::zlock::lock (this=this@entry=0x563390448620) at /home/smilencer/code/pegasus/rdsn/src/core/core/zlocks.cpp:103
#2  0x000056338b2b04c9 in dsn::zauto_lock::zauto_lock (lock=..., this=<synthetic pointer>) at /home/smilencer/code/pegasus/rdsn/include/dsn/tool-api/zlocks.h:121
#3  dsn::replication::policy_context::<lambda()>::operator() (__closure=0x56339b04501d) at /home/smilencer/code/pegasus/rdsn/src/dist/replication/meta_server/meta_backup_service.cpp:725
#4  std::_Function_handler<void(), dsn::replication::policy_context::issue_new_backup_unlocked()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /usr/include/c++/7/bits/std_function.h:316
#5  0x000056338b45ac79 in dsn::task::exec_internal (this=0x56339b044f55) at /home/smilencer/code/pegasus/rdsn/src/core/core/task.cpp:180
#6  0x000056338b47344a in dsn::task_worker::loop (this=0x56338cede240) at /home/smilencer/code/pegasus/rdsn/src/core/core/task_worker.cpp:211
#7  0x000056338b473669 in dsn::task_worker::run_internal (this=0x56338cede240) at /home/smilencer/code/pegasus/rdsn/src/core/core/task_worker.cpp:191
#8  0x00007f661300766f in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007f66132da6db in start_thread (arg=0x7f66059c4700) at pthread_create.c:463
#10 0x00007f66126c488f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) f 3
#3  dsn::replication::policy_context::<lambda()>::operator() (__closure=0x56339b04501d) at /home/smilencer/code/pegasus/rdsn/src/dist/replication/meta_server/meta_backup_service.cpp:725
725	                             zauto_lock l(_lock);
(gdb) p this
$1 = (dsn::replication::policy_context * const) 0x563390448610
(gdb) l
720	
721	    if (!should_start_backup_unlocked()) {
722	        tasking::enqueue(LPC_DEFAULT_CALLBACK,
723	                         nullptr,
724	                         [this]() {
725	                             zauto_lock l(_lock);
726	                             issue_new_backup_unlocked();
727	                         },
728	                         0,
729	                         _backup_service->backup_option().issue_backup_interval_ms);
(gdb) p _lock
$2 = {_h = 0x56339439eb80}

Then we find line 723 lacking a tracker to track unfinished tasks.
After that, we resolve a bug according to this train of thought, which always happens in our general unit tests.

 Thread 1 (Thread 0x7efbc27fc700 (LWP 16882)):
#0  0x00007efbec0018f0 in ?? ()
#1  0x0000000000638f66 in dsn::replication::server_state::on_update_configuration_on_remote_reply (this=0x7efbee4cf8a0, ec=..., config_request=std::shared_ptr (count 1, weak 0) 0x2251df8) at /home/travis/build/XiaoMi/rdsn/src/dist/replication/meta_server/server_state.cpp:1573
#2  0x0000000000729cf9 in dsn::task::exec_internal (this=this@entry=0x216216d) at /home/travis/build/XiaoMi/rdsn/src/core/core/task.cpp:180
#3  0x000000000073de6d in dsn::task_worker::loop (this=0x20efa50) at /home/travis/build/XiaoMi/rdsn/src/core/core/task_worker.cpp:211
#4  0x000000000073e069 in dsn::task_worker::run_internal (this=0x20efa50) at /home/travis/build/XiaoMi/rdsn/src/core/core/task_worker.cpp:191
#5  0x00007efc1095da60 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007efc10bb8184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7  0x00007efc100c4ffd in clone () from /lib/x86_64-linux-gnu/libc.so.6
ERROR: run dsn.meta.test failed, return_code = 1

By adding a tracker here.

@neverchanje neverchanje changed the title fix(tasking::enqueue): core dump occurs after policy_context/backup_service destructed fix(task): core dump occurs after policy_context/backup_service destructed Dec 3, 2019
@neverchanje neverchanje changed the title fix(task): core dump occurs after policy_context/backup_service destructed fix(meta): bind task_tracker for each task in meta server Dec 3, 2019
@acelyc111 acelyc111 merged commit 0de9e39 into XiaoMi:master Dec 3, 2019
@neverchanje neverchanje added the type/bug-fix This PR fixes a bug. label Dec 6, 2019
@Smityz Smityz deleted the tracker branch June 4, 2020 15:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
1.12.2 type/bug-fix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants