Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Deadlock with yql partitions vtable and delete tables #16109

Closed
hulien22 opened this issue Feb 14, 2023 · 1 comment
Closed

[DocDB] Deadlock with yql partitions vtable and delete tables #16109

hulien22 opened this issue Feb 14, 2023 · 1 comment
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/critical Critical issue

Comments

@hulien22
Copy link
Contributor

hulien22 commented Feb 14, 2023

Jira Link: DB-5514

Description

Lock order inversion causing deadlock.

Has catalog manager mutex_:

Thread 214 (Thread 0x7f381556a700 (LWP 76554)):
#0  0x00007f383c06aa0a in futex_wait (private=0, expected=0, futex_word=0x1820922c) at ../sysdeps/unix/sysv/linux/futex-internal.h:61
#1  futex_wait_simple (private=0, expected=0, futex_word=0x1820922c) at ../sysdeps/nptl/futex-internal.h:135
#2  __pthread_rwlock_wrlock_slow (rwlock=rwlock@entry=0x18209220) at pthread_rwlock_wrlock.c:67
#3  0x00007f383c0705e8 in __GI___pthread_rwlock_wrlock (rwlock=rwlock@entry=0x18209220) at pthread_rwlock_wrlock.c:124
#4  0x00007f384c211557 in std::shared_timed_mutex::lock (this=0x18209220) at /cases/home/yugabyte/yb-software/yugabyte-2.12.5.0/linuxbrew-xxxxxxxxxxxxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/shared_mutex:102
#5  std::lock_guard<std::shared_timed_mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /cases/home/yugabyte/yb-software/yugabyte-2.12.5.0/linuxbrew-xxxxxxxxxxxxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/mutex:386
#6  yb::master::YQLPartitionsVTable::RemoveFromCache (this=0x182091e0, table_id=...) at ../../src/yb/master/yql_partitions_vtable.cc:262
#7  0x00007f384c0931e5 in yb::master::CatalogManager::DeleteTableInternal (this=this@entry=0x275a000, req=req@entry=0x225a2138, resp=resp@entry=0x225a2160, rpc=rpc@entry=0x7f3815561cb0) at ../../src/yb/master/catalog_manager.cc:4831
#8  0x00007f384c0949f7 in yb::master::CatalogManager::DeleteTable (this=0x275a000, req=req@entry=0x225a2138, resp=resp@entry=0x225a2160, rpc=rpc@entry=0x7f3815561cb0) at ../../src/yb/master/catalog_manager.cc:4768
#9  0x00007f384c144fe9 in yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)::{lambda()#1}::operator()() const (__closure=<synthetic pointer>) at ../../src/yb/master/master_service_base-internal.h:146
#10 yb::master::MasterServiceBase::HandleOnLeader<yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB, yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)::{lambda()#1}>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*, yb::Status (yb::master::CatalogManager::*)(yb::master::DeleteTableRequestPB const*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext*), char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>)::{lambda()#1}, char const*, int, char const*, yb::StronglyTypedBool<yb::master::HoldCatalogLock_Tag>) (hold_catalog_lock=..., function_name=<synthetic pointer>, line_number=59, file_name=0x7f384c2a68b8 "../../src/yb/master/master_ddl_service.cc", f=..., rpc=0x7f3815561cb0, resp=0x225a2160, req=0x225a2138, this=<optimized out>) at ../../src/yb/master/master_service_base-internal.h:84
#11 yb::master::MasterServiceBase::HandleIn<yb::master::CatalogManager, yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB> (hold_catalog_lock=..., function_name=<synthetic pointer>, line_number=59, file_name=0x7f384c2a68b8 "../../src/yb/master/master_ddl_service.cc", f=<optimized out>, rpc=0x7f3815561cb0, resp=0x225a2160, req=<optimized out>, this=<optimized out>) at ../../src/yb/master/master_service_base-internal.h:145
#12 yb::master::(anonymous namespace)::MasterDdlServiceImpl::DeleteTable (this=<optimized out>, req=0x225a2138, resp=0x225a2160, rpc=...) at ../../src/yb/master/master_ddl_service.cc:29
#13 0x00007f3843ec7d2b in <lambda(const yb::master::DeleteTableRequestPB*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext)>::operator() (rpc_context=..., resp=0x225a2160, req=0x225a2138, __closure=<synthetic pointer>) at src/yb/master/master_ddl.service.cc:758
#14 yb::rpc::HandleCall<yb::rpc::RpcCallPBParamsImpl<yb::master::DeleteTableRequestPB, yb::master::DeleteTableResponsePB>, yb::master::MasterDdlIf::InitMethods(const scoped_refptr<yb::MetricEntity>&)::<lambda(yb::rpc::InboundCallPtr)>::<lambda(const yb::master::DeleteTableRequestPB*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext)> >(yb::rpc::InboundCallPtr, <lambda(const yb::master::DeleteTableRequestPB*, yb::master::DeleteTableResponsePB*, yb::rpc::RpcContext)>) (call=..., f=...) at ../../src/yb/rpc/local_call.h:122
#15 0x00007f3843ec7f3d in <lambda(yb::rpc::InboundCallPtr)>::operator() (call=..., __closure=0x281b858) at src/yb/master/master_ddl.service.cc:759
#16 std::_Function_handler<void(std::shared_ptr<yb::rpc::InboundCall>), yb::master::MasterDdlIf::InitMethods(const scoped_refptr<yb::MetricEntity>&)::<lambda(yb::rpc::InboundCallPtr)> >::_M_invoke(const std::_Any_data &, std::shared_ptr<yb::rpc::InboundCall> &&) (__functor=..., __args#0=...) at /cases/home/yugabyte/yb-software/yugabyte-2.12.5.0/linuxbrew-xxxxxxxxxxxxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:1871
#17 0x00007f3843ebcbd4 in std::function<void (std::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::shared_ptr<yb::rpc::InboundCall>) const (__args#0=..., this=<optimized out>) at /cases/home/yugabyte/yb-software/yugabyte-2.12.5.0/linuxbrew-xxxxxxxxxxxxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:2267
#18 yb::master::MasterDdlIf::Handle (this=<optimized out>, call=...) at src/yb/master/master_ddl.service.cc:591
#19 0x00007f3841bd794e in yb::rpc::ServicePoolImpl::Handle (this=0x27d6fc0, incoming=...) at ../../src/yb/rpc/service_pool.cc:269
#20 0x00007f3841b79bd4 in yb::rpc::InboundCall::InboundCallTask::Run (this=<optimized out>) at ../../src/yb/rpc/inbound_call.cc:223
#21 0x00007f3841beb698 in yb::rpc::(anonymous namespace)::Worker::Execute (this=<optimized out>) at ../../src/yb/rpc/thread_pool.cc:104
#22 0x00007f3840d2bca5 in std::function<void ()>::operator()() const (this=0x1f09c298) at /cases/home/yugabyte/yb-software/yugabyte-2.12.5.0/linuxbrew-xxxxxxxxxxxxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:2267
#23 yb::Thread::SuperviseThread (arg=0x1f09c240) at ../../src/yb/util/thread.cc:774
#24 0x00007f383c06c694 in start_thread (arg=0x7f381556a700) at pthread_create.c:333
#25 0x00007f383b7a941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Has vtable mutex:

Thread 696 (Thread 0x7f376740e700 (LWP 126125)):
#0  0x00007f383b792517 in sched_yield () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f384c0b370a in boost::detail::yield (k=7686415) at /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064200-dd4872fe56-centos7-x86_64-linuxbrew-gcc5/installed/uninstrumented/include/boost/smart_ptr/detail/yield_k.hpp:144
#2  yb::rw_semaphore::lock_shared (this=0x275a028) at ../../src/yb/util/rw_semaphore.h:81
#3  yb::rw_spinlock::lock_shared (this=0x275a028) at ../../src/yb/util/locks.h:115
#4  yb::NonRecursiveSharedLock<yb::rw_spinlock>::NonRecursiveSharedLock (this=0x7f3767405da0, mutex=...) at ../../src/yb/util/debug/lock_debug.h:41
#5  0x00007f384c04a4b1 in yb::master::CatalogManager::GetTables (this=0x275a000, mode=yb::master::GetTablesMode::kVisibleToClient) at ../../src/yb/master/catalog_manager.cc:5885
#6  0x00007f384c2148f2 in yb::master::YQLPartitionsVTable::GenerateAndCacheData (this=0x182091e0) at ../../src/yb/master/yql_partitions_vtable.cc:140
#7  0x00007f384c03c35d in yb::master::CatalogManager::RebuildYQLSystemPartitions (this=0x275a000) at ../../src/yb/master/catalog_manager.cc:10486
#8  0x00007f3840d311b4 in yb::ThreadPool::DispatchThread (this=0x2441800, permanent=false) at ../../src/yb/util/threadpool.cc:611
#9  0x00007f3840d2bca5 in std::function<void ()>::operator()() const (this=0x2694f918) at /cases/home/yugabyte/yb-software/yugabyte-2.12.5.0/linuxbrew-xxxxxxxxxxxxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/functional:2267
#10 yb::Thread::SuperviseThread (arg=0x2694f8c0) at ../../src/yb/util/thread.cc:774
#11 0x00007f383c06c694 in start_thread (arg=0x7f376740e700) at pthread_create.c:333
#12 0x00007f383b7a941d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
@hulien22 hulien22 added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Feb 14, 2023
@hulien22 hulien22 self-assigned this Feb 14, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Feb 14, 2023
@hulien22
Copy link
Contributor Author

Was not caught by TSAN deadlock detection in CppCassandraDriverTest.YQLPartitionsVtableCacheRefresh test due to #14710

hulien22 added a commit that referenced this issue Feb 17, 2023
Summary:
Fixing lock order inversion with yql system.partitions building and table deletions:
One thread doing deletes in catalog manager is holding CM's mutex_ and blocked on getting the yqlpartitions mutex_:
```
#5  std::lock_guard<std::shared_timed_mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /cases/home/yugabyte/yb-software/yugabyte-2.12.5.0/linuxbrew-xxxxxxxxxxxxxxxxxxxxxxxx/Cellar/gcc/5.5.0_4/include/c++/5.5.0/mutex:386
#6  yb::master::YQLPartitionsVTable::RemoveFromCache (this=0x182091e0, table_id=...) at ../../src/yb/master/yql_partitions_vtable.cc:262
#7  0x00007f384c0931e5 in yb::master::CatalogManager::DeleteTableInternal (this=this@entry=0x275a000, req=req@entry=0x225a2138, resp=resp@entry=0x225a2160, rpc=rpc@entry=0x7f3815561cb0) at ../../src/yb/master/catalog_manager.cc:4831
```
And another thread rebuilding the vtable has the yqlpartitions lock and is waiting on the CM mutex_:
```
#4  yb::NonRecursiveSharedLock<yb::rw_spinlock>::NonRecursiveSharedLock (this=0x7f3767405da0, mutex=...) at ../../src/yb/util/debug/lock_debug.h:41
#5  0x00007f384c04a4b1 in yb::master::CatalogManager::GetTables (this=0x275a000, mode=yb::master::GetTablesMode::kVisibleToClient) at ../../src/yb/master/catalog_manager.cc:5885
#6  0x00007f384c2148f2 in yb::master::YQLPartitionsVTable::GenerateAndCacheData (this=0x182091e0) at ../../src/yb/master/yql_partitions_vtable.cc:140
#7  0x00007f384c03c35d in yb::master::CatalogManager::RebuildYQLSystemPartitions (this=0x275a000) at ../../src/yb/master/catalog_manager.cc:10486
```

For now keeping the default of generating the partitions cache on changes, due to create table perf concerns of the purely bg thread approach

Test Plan:
```
ybd tsan --gtest_filter CppCassandraDriverTest.YQLPartitionsVtableCacheRefresh
```

Reviewers: bogdan, sergei, asrivastava

Reviewed By: asrivastava

Subscribers: kannan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D22943
hulien22 added a commit that referenced this issue Feb 18, 2023
Summary:
Original commit: 68fae2f / D22943
Fixing lock order inversion with yql system.partitions building and table deletions:
One thread doing deletes in catalog manager is holding CM's mutex_ and blocked on getting the yqlpartitions mutex_:
```
```
And another thread rebuilding the vtable has the yqlpartitions lock and is waiting on the CM mutex_:
```
```

For now keeping the default of generating the partitions cache on changes, due to create table perf concerns of the purely bg thread approach

Test Plan:
```
ybd tsan --gtest_filter CppCassandraDriverTest.YQLPartitionsVtableCacheRefresh
```

Reviewers: bogdan, sergei, asrivastava

Reviewed By: asrivastava

Subscribers: ybase, kannan

Differential Revision: https://phabricator.dev.yugabyte.com/D23006
@yugabyte-ci yugabyte-ci added priority/critical Critical issue and removed status/awaiting-triage Issue awaiting triage priority/medium Medium priority issue labels Feb 21, 2023
hulien22 added a commit that referenced this issue Mar 8, 2023
Summary:
Original commit: 68fae2f / D22943
Fixing lock order inversion with yql system.partitions building and table deletions:
One thread doing deletes in catalog manager is holding CM's mutex_ and blocked on getting the yqlpartitions mutex_:
```
```
And another thread rebuilding the vtable has the yqlpartitions lock and is waiting on the CM mutex_:
```
```

For now keeping the default of generating the partitions cache on changes, due to create table perf concerns of the purely bg thread approach

Test Plan:
```
ybd tsan --gtest_filter CppCassandraDriverTest.YQLPartitionsVtableCacheRefresh
```

Reviewers: bogdan, sergei, asrivastava

Reviewed By: asrivastava

Subscribers: ybase, kannan

Differential Revision: https://phabricator.dev.yugabyte.com/D23005
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/critical Critical issue
Projects
None yet
Development

No branches or pull requests

2 participants