Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed TSAN check in RegionKVStoreOldTest.ReadIndex lock-order-inversion (potential deadlock) #9091

Closed
JaySon-Huang opened this issue May 27, 2024 · 1 comment · Fixed by #9095
Assignees
Labels

Comments

@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented May 27, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from RegionKVStoreOldTest
[ RUN      ] RegionKVStoreOldTest.ReadIndex
[2024/05/27 16:52:42.700 +08:00] [INFO] [TiFlashTestEnv.cpp:262] ["Clean path /DATA/disk1/jaysonhuang/tiflash-master/cmake-build-tsan/tmp/region_kvs_old_test for bootstrap"] [source=Test] [thread_id=1]
[2024/05/27 16:52:42.700 +08:00] [WARN] [ProxyFFI.cpp:917] ["Set mock rust ptr gc function"] [source=MockSetRustGcHelper] [thread_id=1]
[2024/05/27 16:52:42.700 +08:00] [INFO] [KVStore.cpp:70] ["KVStore inited, eager_raft_log_gc_enabled=false"] [thread_id=1]
[2024/05/27 16:52:42.700 +08:00] [INFO] [PageStorageImpl.cpp:54] ["PageStorageImpl start. Config{ PageStorageConfig {blob_file_limit_size: 268435456, blob_spacemap_type: 2, blob_heavy_gc_valid_rate: 0.500, blob_heavy_gc_valid_rate_raft_data: 0.050, blob_block_alignment_bytes: 0, wal_roll_size: 2097152, wal_max_persisted_log_files: 4} }"] [source=RegionPersister]
 [thread_id=1]
[2024/05/27 16:52:42.701 +08:00] [INFO] [PageDirectoryFactory.cpp:68] ["PageDirectory restored, max_page_id=0 max_applied_ver=0"] [source=RegionPersister] [thread_id=1]
[2024/05/27 16:52:42.701 +08:00] [INFO] [RegionPersister.cpp:381] ["RegionPersister running. Current Run Mode is ONLY_V3"] [thread_id=1]
[2024/05/27 16:52:42.701 +08:00] [INFO] [KVStore.cpp:89] ["Restored 0 regions"] [thread_id=1]
[2024/05/27 16:52:42.702 +08:00] [INFO] [KVStore.cpp:136] ["Parsed proxy config snap_handle_pool_size 4"] [thread_id=1]
[2024/05/27 16:52:42.703 +08:00] [INFO] [KVStore.cpp:384] ["Set store info id: 1234"] [thread_id=1]
[2024/05/27 16:52:42.703 +08:00] [INFO] [kvstore_helper.h:102] ["Finished setup"] [source=RegionKVStoreOldTest] [thread_id=1]
[2024/05/27 16:52:42.704 +08:00] [INFO] [RegionTable.cpp:57] ["get new table, keyspace=4294967295 table_id=1"] [thread_id=1]
[2024/05/27 16:52:42.704 +08:00] [INFO] [ReadIndex.cpp:339] ["Start to initialize read-index workers: worker count 64, runner count 1"] [thread_id=1]
[2024/05/27 16:52:42.705 +08:00] [INFO] [RegionTable.cpp:57] ["get new table, keyspace=4294967295 table_id=2"] [thread_id=1]
[2024/05/27 16:52:42.705 +08:00] [INFO] [ReadIndexWorkerManager.cpp:161] ["Start read-index runner 0"] [source=ReadIndexWorkers] [thread_id=146]
[2024/05/27 16:52:42.706 +08:00] [INFO] [ReadIndex.cpp:143] ["start to check regions ready, min-wait-tick 0.001s, max-wait-tick 20s, wait-region-ready-timeout 1200.000s"] [source=WaitCheckRegionReady] [thread_id=147]
[2024/05/27 16:52:42.706 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=0 region_id=1 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=146]
[2024/05/27 16:52:42.706 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=0 region_id=2 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=146]
[2024/05/27 16:52:42.706 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=0 region_id=3 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=146]
[2024/05/27 16:52:42.706 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=0 region_id=9 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=146]                                                                                                                            [2024/05/27 16:52:42.706 +08:00] [INFO] [ReadIndex.cpp:255] ["1 regions need to apply to latest index, sleep for 0.001s"] [source=WaitCheckRegionReady] [thread_id=147]
[2024/05/27 16:52:42.707 +08:00] [INFO] [ReadIndex.cpp:255] ["1 regions need to apply to latest index, sleep for 0.002s"] [source=WaitCheckRegionReady] [thread_id=147]                                                                                                                                                                                                     [2024/05/27 16:52:42.710 +08:00] [INFO] [ReadIndex.cpp:289] ["finish to check 4 regions, time cost 0.004s"] [source=WaitCheckRegionReady] [thread_id=147]                                                                                                                                                                                                                   [2024/05/27 16:52:42.710 +08:00] [INFO] [KVStore.cpp:355] ["Handle destroy [region_id=9 applied_term=6 applied_index=66]"] [thread_id=1]
[2024/05/27 16:52:42.710 +08:00] [INFO] [KVStore.cpp:258] ["Start to remove region_id=9"] [thread_id=1]                                                                                                                                                                                                                                                                     ==================
WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=3111683) 
 Cycle in lock order graph: M0 (0x7b4c00010500) => M1 (0x7b5400059100) => M2 (0x7b6800001410) => M0 

Mutex M1 acquired here while holding mutex M0 in thread T155:
    #0 pthread_mutex_lock /root/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1339:3 (gtests_dbms+0x2f7375b)                                                                                                                                                                                                                                                #1 std::__1::mutex::lock() <null> (libc++.so.1+0x8f7c5)
    #2 DB::MockRaftStoreProxy::makeReadIndexTask(kvrpcpb::ReadIndexRequest) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Debug/MockKVStore/MockRaftStoreProxy.cpp:85:14 (gtests_dbms+0xe45643b)
    #3 DB::MockFFIImpls::fn_make_read_index_task(DB::RaftStoreProxyPtr, DB::BaseBuffView) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Debug/MockKVStore/MockReadIndex.cpp:46:21 (gtests_dbms+0xe46decc)
    #4 DB::TiFlashRaftProxyHelper::makeReadIndexTask(kvrpcpb::ReadIndexRequest const&) const /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/Proxy.cpp:106:26 (gtests_dbms+0xd02d6e6)
    #5 DB::makeReadIndexTask(DB::TiFlashRaftProxyHelper const&, kvrpcpb::ReadIndexRequest&) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorkerImpl.h:124:23 (gtests_dbms+0xd043781)
    #6 DB::ReadIndexDataNode::runOneRound(DB::TiFlashRaftProxyHelper const&, std::__1::shared_ptr<DB::ReadIndexNotifyCtrl> const&) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexDataNode.cpp:107:30 (gtests_dbms+0xd043781)
    #7 DB::ReadIndexWorker::consumeRegionNotifies(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorker.cpp:138:15 (gtests_dbms+0xd04fc92)
    #8 DB::ReadIndexWorker::runOneRound(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorker.cpp:162:9 (gtests_dbms+0xd050098)
    #9 DB::ReadIndexWorkerManager::ReadIndexRunner::runOneRound(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorkerManager.cpp:145:21 (gtests_dbms+0xd059390)
    #10 DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0::operator()() const /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorkerManager.cpp:166:13 (gtests_dbms+0xd059390)
    #11 decltype(std::declval<DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0>()()) std::__1::__invoke[abi:ue170006]<DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0>(DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0&&) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__type_traits/invoke.h:340:25 (gtests_dbm
s+0xd059390)
    #12 void std::__1::__thread_execute[abi:ue170006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, DB::ReadIndexWorkerManager::
ReadIndexRunner::asyncRun()::$_0>&, std::__1::__tuple_indices<>) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__thread/thread.h:221:5 (gtests_dbms+0xd059390)
    #13 void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0>>(void*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__thread/thread.h:232:5 (gtests_dbms+0xd059390)

    Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative warning message

  Mutex M2 acquired here while holding mutex M1 in main thread:
    #0 pthread_mutex_lock /root/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1339:3 (gtests_dbms+0x2f7375b)                                                                                                                                                                                                                                                #1 std::__1::mutex::lock() <null> (libc++.so.1+0x8f7c5)
    #2 DB::MockRaftStoreProxy::debugAddRegions(DB::KVStore&, DB::TMTContext&, std::__1::vector<unsigned long, std::__1::allocator<unsigned long>>, std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>>&&) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Debug/MockKVStore/MockRaftStoreProxy.cpp:171:26 (gtests_dbms+0xe457b4c)
    #3 DB::tests::RegionKVStoreOldTest_ReadIndex_Test::TestBody() /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/tests/gtest_kvstore.cpp:85:25 (gtests_dbms+0x61693d4)
    #4 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2443:10 (gtests_dbms+0xd0a1689)
    #5 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2479:14 (gtests_dbms+0xd0a1689)
    #6 testing::Test::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2517:5 (gtests_dbms+0xd076e23)
    #7 testing::TestInfo::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2693:11 (gtests_dbms+0xd07863e)
    #8 testing::TestCase::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2811:28 (gtests_dbms+0xd079623)
    #9 testing::internal::UnitTestImpl::RunAllTests() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:5177:43 (gtests_dbms+0xd08ba3a)
    #10 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2443:10 (gtests_dbms+0xd0a2b9d)
    #11 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2479:14 (gtests_dbms+0xd0a2b9d)
    #12 testing::UnitTest::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:4786:10 (gtests_dbms+0xd08b304)
    #13 RUN_ALL_TESTS() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/include/gtest/gtest.h:2341:46 (gtests_dbms+0x689d936)
    #14 main /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/TestUtils/gtests_dbms_main.cpp:116:16 (gtests_dbms+0x689d936)

  Mutex M0 acquired here while holding mutex M2 in main thread:
    #0 pthread_mutex_lock /root/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1339:3 (gtests_dbms+0x2f7375b)
    #1 std::__1::mutex::lock() <null> (libc++.so.1+0x8f7c5)
    #2 DB::ReadIndexDataNode::~ReadIndexDataNode() /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexDataNode.cpp:290:14 (gtests_dbms+0xd04520b)
    #3 void std::__1::__destroy_at[abi:ue170006]<DB::ReadIndexDataNode, 0>(DB::ReadIndexDataNode*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/construct_at.h:69:13 (gtests_dbms+0xd05121d)
    #4 void std::__1::destroy_at[abi:ue170006]<DB::ReadIndexDataNode, 0>(DB::ReadIndexDataNode*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/construct_at.h:104:5 (gtests_dbms+0xd05121d)
    #5 void std::__1::allocator_traits<std::__1::allocator<DB::ReadIndexDataNode>>::destroy[abi:ue170006]<DB::ReadIndexDataNode, void, void>(std::__1::allocator<DB::ReadIndexDataNode>&, DB::ReadIndexDataNode*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/allocator_traits.h:323:9 (gtests_dbms+0xd05121d)
    #6 std::__1::__shared_ptr_emplace<DB::ReadIndexDataNode, std::__1::allocator<DB::ReadIndexDataNode>>::__on_zero_shared() /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/shared_ptr.h:320:13 (gtests_dbms+0xd05121d)
    #7 std::__1::__shared_count::__release_shared[abi:ue170006]() /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/shared_ptr.h:172:9 (gtests_dbms+0xd05145d)
    #8 std::__1::__shared_weak_count::__release_shared[abi:ue170006]() /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/shared_ptr.h:213:27 (gtests_dbms+0xd05145d)
    #9 std::__1::shared_ptr<DB::ReadIndexDataNode>::~shared_ptr[abi:ue170006]() /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/shared_ptr.h:772:23 (gtests_dbms+0xd05145d)
    #10 std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>::~pair() /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__utility/pair.h:81:29 (gtests_dbms+0xd05145d)
    #11 void std::__1::__destroy_at[abi:ue170006]<std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>, 0>(std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/construct_at.h:69:13 (gtests_dbms+0xd05145d)
    #12 void std::__1::destroy_at[abi:ue170006]<std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>, 0>(std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/construct_at.h:104:5 (gtests_dbms+0xd05145d)
    #13 void std::__1::allocator_traits<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>>>::destroy[abi:ue170006]<std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void, void>(std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_ty
pe<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>>&, std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/allocator_traits.h:323:9 (gtests_dbms+0xd05145d)
    #14 std::__1::__hash_node_destructor<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>>>::operator()[abi:ue170006](std::__1::__hash_node<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>*) /DATA/disk1/ra_common/tiflash-env-17/
sysroot/bin/../include/c++/v1/__hash_table:674:13 (gtests_dbms+0xd05145d)
    #15 std::__1::unique_ptr<std::__1::__hash_node<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>, std::__1::__hash_node_destructor<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>>>>::reset[abi:ue170006](std::__1::__hash_node
<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/unique_ptr.h:300:7 (gtests_dbms+0xd05145d)
    #16 std::__1::unique_ptr<std::__1::__hash_node<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>, std::__1::__hash_node_destructor<std::__1::allocator<std::__1::__hash_node<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>>>>::~unique_ptr[abi:ue170006]() /DATA/disk1/r
a_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/unique_ptr.h:266:75 (gtests_dbms+0xd05145d)
    #17 std::__1::__hash_table<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, std::__1::__unordered_map_hasher<unsigned long, std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, std::__1::hash<unsigned long>, std::__1::equal_to<unsigned long>, true>, std::__1::__unordered_map_equal<un
signed long, std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, std::__1::equal_to<unsigned long>, std::__1::hash<unsigned long>, true>, std::__1::allocator<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>>>::erase(std::__1::__hash_const_iterator<std::__1::__hash_node<std::__1::__hash_v
alue_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, void*>*>) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__hash_table:2230:5 (gtests_dbms+0xd05145d)
    #18 unsigned long std::__1::__hash_table<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, std::__1::__unordered_map_hasher<unsigned long, std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, std::__1::hash<unsigned long>, std::__1::equal_to<unsigned long>, true>, std::__1::__unordere
d_map_equal<unsigned long, std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>, std::__1::equal_to<unsigned long>, std::__1::hash<unsigned long>, true>, std::__1::allocator<std::__1::__hash_value_type<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>>>>::__erase_unique<unsigned long>(unsigned long const&) /DATA/disk1/
ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__hash_table:2256:5 (gtests_dbms+0xd05145d)
    #19 std::__1::unordered_map<unsigned long, std::__1::shared_ptr<DB::ReadIndexDataNode>, std::__1::hash<unsigned long>, std::__1::equal_to<unsigned long>, std::__1::allocator<std::__1::pair<unsigned long const, std::__1::shared_ptr<DB::ReadIndexDataNode>>>>::erase[abi:ue170006](unsigned long const&) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/
c++/v1/unordered_map:1449:59 (gtests_dbms+0xd0503b4)
    #20 DB::ReadIndexWorker::DataMap::removeRegion(unsigned long) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorker.cpp:114:16 (gtests_dbms+0xd0503b4)
    #21 DB::ReadIndexWorker::removeRegion(unsigned long) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorker.cpp:190:14 (gtests_dbms+0xd0503b4)
    #22 DB::KVStore::removeRegion(unsigned long, bool, DB::RegionTable&, DB::KVStoreTaskLock const&, DB::RegionTaskLock const&) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/KVStore.cpp:272:69 (gtests_dbms+0xce5a36f)
    #23 DB::KVStore::handleDestroy(unsigned long, DB::TMTContext&, DB::KVStoreTaskLock const&) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/KVStore.cpp:357:5 (gtests_dbms+0xce5b2b1)
    #24 DB::KVStore::handleDestroy(unsigned long, DB::TMTContext&) /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/KVStore.cpp:336:5 (gtests_dbms+0xce5abde)
    #25 DB::tests::RegionKVStoreOldTest_ReadIndex_Test::TestBody()::$_4::operator()() const /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/tests/gtest_kvstore.cpp:138:13 (gtests_dbms+0x616b05f)
    #26 ext::ScopeGuard<DB::tests::RegionKVStoreOldTest_ReadIndex_Test::TestBody()::$_4>::~ScopeGuard() /DATA/disk1/jaysonhuang/tiflash-master/libs/libcommon/include/ext/scope_guard.h:33:21 (gtests_dbms+0x616b05f)
    #27 DB::tests::RegionKVStoreOldTest_ReadIndex_Test::TestBody() /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/tests/gtest_kvstore.cpp:146:9 (gtests_dbms+0x616b05f)
    #28 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2443:10 (gtests_dbms+0xd0a1689)
    #29 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2479:14 (gtests_dbms+0xd0a1689)
    #30 testing::Test::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2517:5 (gtests_dbms+0xd076e23)
    #31 testing::TestInfo::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2693:11 (gtests_dbms+0xd07863e)
    #32 testing::TestCase::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2811:28 (gtests_dbms+0xd079623)
    #33 testing::internal::UnitTestImpl::RunAllTests() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:5177:43 (gtests_dbms+0xd08ba3a)
    #34 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2443:10 (gtests_dbms+0xd0a2b9d)
    #35 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2479:14 (gtests_dbms+0xd0a2b9d)
    #36 testing::UnitTest::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:4786:10 (gtests_dbms+0xd08b304)
    #37 RUN_ALL_TESTS() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/include/gtest/gtest.h:2341:46 (gtests_dbms+0x689d936)
    #38 main /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/TestUtils/gtests_dbms_main.cpp:116:16 (gtests_dbms+0x689d936)


  Thread T155 'ReadIndexWkr-0' (tid=3111840, running) created by main thread at:
    #0 pthread_create /root/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1020:3 (gtests_dbms+0x2f71a5b)
    #1 std::__1::__libcpp_thread_create[abi:ue170006](unsigned long*, void* (*)(void*), void*) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__threading_support:371:10 (gtests_dbms+0xd0576f4)
    #2 std::__1::thread::thread<DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0, void>(DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0&&) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__thread/thread.h:248:16 (gtests_dbms+0xd0576f4)
    #3 std::__1::__unique_if<std::__1::thread>::__unique_single std::__1::make_unique[abi:ue170006]<std::__1::thread, DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0>(DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun()::$_0&&) /DATA/disk1/ra_common/tiflash-env-17/sysroot/bin/../include/c++/v1/__memory/unique_ptr.h:689:30 (gtests_dbms+0xd0576f4)
    #4 DB::ReadIndexWorkerManager::ReadIndexRunner::asyncRun() /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorkerManager.cpp:151:19 (gtests_dbms+0xd0576f4)
    #5 DB::ReadIndexWorkerManager::asyncRun() /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndexWorkerManager.cpp:70:17 (gtests_dbms+0xd05763f)
    #6 DB::KVStore::asyncRunReadIndexWorkers() const /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/Read/ReadIndex.cpp:357:32 (gtests_dbms+0xd039648)
    #7 DB::tests::RegionKVStoreOldTest_ReadIndex_Test::TestBody() /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/Storages/KVStore/tests/gtest_kvstore.cpp:118:17 (gtests_dbms+0x616a16f)
    #8 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2443:10 (gtests_dbms+0xd0a1689)
    #9 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2479:14 (gtests_dbms+0xd0a1689)
    #10 testing::Test::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2517:5 (gtests_dbms+0xd076e23)
    #11 testing::TestInfo::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2693:11 (gtests_dbms+0xd07863e)
    #12 testing::TestCase::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2811:28 (gtests_dbms+0xd079623)
    #13 testing::internal::UnitTestImpl::RunAllTests() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:5177:43 (gtests_dbms+0xd08ba3a)
    #14 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2443:10 (gtests_dbms+0xd0a2b9d)
    #15 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:2479:14 (gtests_dbms+0xd0a2b9d)
    #16 testing::UnitTest::Run() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/src/gtest.cc:4786:10 (gtests_dbms+0xd08b304)
    #17 RUN_ALL_TESTS() /DATA/disk1/jaysonhuang/tiflash-master/contrib/googletest/googletest/include/gtest/gtest.h:2341:46 (gtests_dbms+0x689d936)
    #18 main /DATA/disk1/jaysonhuang/tiflash-master/dbms/src/TestUtils/gtests_dbms_main.cpp:116:16 (gtests_dbms+0x689d936)

SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) (/DATA/disk1/ra_common/tiflash-env-17/sysroot/lib/x86_64-unknown-linux-gnu/libc++.so.1+0x8f7c5) in std::__1::mutex::lock()
==================
[2024/05/27 16:52:43.603 +08:00] [INFO] [WALStore.cpp:143] ["Creating log file for writing [fullname=/DATA/disk1/jaysonhuang/tiflash-master/cmake-build-tsan/tmp/region_kvs_old_test/page/kvstore/wal/log_1_0]"] [source=RegionPersister] [thread_id=1]
[2024/05/27 16:52:43.604 +08:00] [INFO] [KVStore.cpp:281] ["Persisted region_id=9 deleted"] [thread_id=1]
[2024/05/27 16:52:43.604 +08:00] [INFO] [RegionTable.cpp:237] ["remove region in RegionTable done, region_id=9"] [thread_id=1]
[2024/05/27 16:52:43.604 +08:00] [INFO] [RegionTable.cpp:249] ["remove region in storage done, region_id=9"] [thread_id=1]
[2024/05/27 16:52:43.604 +08:00] [INFO] [KVStore.cpp:285] ["Remove region_id=9 done"] [thread_id=1]
[2024/05/27 16:52:43.604 +08:00] [INFO] [ReadIndexWorkerManager.cpp:171] ["Start to stop read-index runner 0"] [source=ReadIndexWorkers] [thread_id=146]
[2024/05/27 16:52:43.605 +08:00] [INFO] [ReadIndexWorkerManager.cpp:132] ["Thread of read-index runner 0 has joined"] [source=ReadIndexWorkers] [thread_id=1]
[2024/05/27 16:52:43.605 +08:00] [INFO] [ReadIndexWorkerManager.cpp:161] ["Start read-index runner 0"] [source=ReadIndexWorkers] [thread_id=148]
[2024/05/27 16:52:43.605 +08:00] [INFO] [ReadIndex.cpp:143] ["start to check regions ready, min-wait-tick 0.001s, max-wait-tick 0.002s, wait-region-ready-timeout 0.005s"] [source=WaitCheckRegionReady] [thread_id=149]
[2024/05/27 16:52:43.605 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=0 region_id=1 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=148]
[2024/05/27 16:52:43.606 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=0 region_id=2 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=148]
[2024/05/27 16:52:43.606 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=0 region_id=3 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=148]
[2024/05/27 16:52:43.606 +08:00] [INFO] [ReadIndex.cpp:255] ["1 regions need to apply to latest index, sleep for 0.001s"] [source=WaitCheckRegionReady] [thread_id=149]
[2024/05/27 16:52:43.607 +08:00] [INFO] [ReadIndex.cpp:255] ["1 regions need to apply to latest index, sleep for 0.002s"] [source=WaitCheckRegionReady] [thread_id=149]
[2024/05/27 16:52:43.609 +08:00] [INFO] [ReadIndex.cpp:255] ["1 regions need to apply to latest index, sleep for 0.002s"] [source=WaitCheckRegionReady] [thread_id=149]
[2024/05/27 16:52:43.611 +08:00] [WARN] [ReadIndex.cpp:282] ["1 regions CANNOT catch up with latest index, (region-id,latest-index,apply-index): 2,6,5"] [source=WaitCheckRegionReady] [thread_id=149]
[2024/05/27 16:52:43.611 +08:00] [INFO] [ReadIndex.cpp:289] ["finish to check 3 regions, time cost 0.006s"] [source=WaitCheckRegionReady] [thread_id=149]
[2024/05/27 16:52:43.611 +08:00] [INFO] [ReadIndexWorkerManager.cpp:171] ["Start to stop read-index runner 0"] [source=ReadIndexWorkers] [thread_id=148]
[2024/05/27 16:52:43.611 +08:00] [INFO] [ReadIndexWorkerManager.cpp:132] ["Thread of read-index runner 0 has joined"] [source=ReadIndexWorkers] [thread_id=1]
[2024/05/27 16:52:43.612 +08:00] [INFO] [ReadIndexWorkerManager.cpp:161] ["Start read-index runner 0"] [source=ReadIndexWorkers] [thread_id=150]
[2024/05/27 16:52:43.622 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=8 region_id=1 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=150]
[2024/05/27 16:52:43.633 +08:00] [TRACE] [ReadIndexDataNode.cpp:93] ["[Learner Read] Read Index in Batch(use histroy), max_ts=8 region_id=1 waiting_tasks=1 running_tasks=0 histroy_ts=8"] [thread_id=150]
[2024/05/27 16:52:43.643 +08:00] [TRACE] [ReadIndexDataNode.cpp:135] ["[Learner Read] Read Index in Batch(new request), max_ts=10 region_id=1 waiting_tasks=1 running_tasks=1 should_build_running_task=true build_success=true"] [thread_id=150]
[2024/05/27 16:52:43.653 +08:00] [TRACE] [ReadIndexDataNode.cpp:93] ["[Learner Read] Read Index in Batch(use histroy), max_ts=8 region_id=1 waiting_tasks=1 running_tasks=0 histroy_ts=10"] [thread_id=150]
[2024/05/27 16:52:43.655 +08:00] [WARN] [ReadIndex.cpp:121] ["[region_id=2] wait learner index timeout, prev_index=5 curr_index=5 to_wait=668 state=0 elapsed_s=0.002 timeout_s=0.002"] [thread_id=1]
[2024/05/27 16:52:43.658 +08:00] [DEBUG] [ReadIndex.cpp:101] ["[region_id=2] wait learner index done, prev_index=5 curr_index=668 to_wait=668 elapsed_s=0.002 timeout_s=100.000"] [thread_id=151]
[2024/05/27 16:52:43.658 +08:00] [INFO] [ReadIndexWorkerManager.cpp:171] ["Start to stop read-index runner 0"] [source=ReadIndexWorkers] [thread_id=150]
[2024/05/27 16:52:43.658 +08:00] [INFO] [ReadIndexWorkerManager.cpp:132] ["Thread of read-index runner 0 has joined"] [source=ReadIndexWorkers] [thread_id=1]
[2024/05/27 16:52:43.658 +08:00] [INFO] [ReadIndex.cpp:371] ["KVStore shutdown, deleting read index worker"] [thread_id=1]
[2024/05/27 16:52:43.659 +08:00] [INFO] [KVStore.cpp:432] ["Destroy KVStore"] [thread_id=1]
[2024/05/27 16:52:43.659 +08:00] [INFO] [ReadIndex.cpp:371] ["KVStore shutdown, deleting read index worker"] [thread_id=1]
[2024/05/27 16:52:43.659 +08:00] [INFO] [KVStore.cpp:434] ["Destroy KVStore Finished"] [thread_id=1]
[       OK ] RegionKVStoreOldTest.ReadIndex (960 ms)
[----------] 1 test from RegionKVStoreOldTest (960 ms total)

4. What is your TiFlash version? (Required)

@JaySon-Huang JaySon-Huang added the type/bug The issue is confirmed as a bug. label May 27, 2024
@CalvinNeo CalvinNeo self-assigned this May 29, 2024
@CalvinNeo
Copy link
Member

CalvinNeo commented May 29, 2024

M1: MockRaftStoreProxy's lock guard
M2: KVStore::taskLock
M0: ReadIndexDataNode's lock guard

M1: MockRaftStoreProxy's lock guard
M2:KVStore::taskLock
M0:ReadIndexDataNode's lock guard

Mutex M1 acquired here while holding mutex M0 in thread T155:
M0: From `ReadIndexDataNode::runOneRound`
M1: In case the regions mutates

Mutex M2 acquired here while holding mutex M1 in main thread:
M1: To mutate regions in mock proxy
M2: To mutate regions in kvstore

Mutex M0 acquired here while holding mutex M2 in main thread:
M2: From KVStore::handleDestroy
M0: To release

@CalvinNeo CalvinNeo mentioned this issue May 30, 2024
12 tasks
ti-chi-bot bot pushed a commit that referenced this issue May 31, 2024
close #9091, ref #9098

Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants