Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] Lock inversion #16136

Closed
amitanandaiyer opened this issue Feb 16, 2023 · 8 comments
Closed

[DocDB] Lock inversion #16136

amitanandaiyer opened this issue Feb 16, 2023 · 8 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/critical Critical issue

Comments

@amitanandaiyer
Copy link
Contributor

amitanandaiyer commented Feb 16, 2023

Jira Link: DB-5575

Description

[ts-3] WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) (pid=15958)
[ts-3] Cycle in lock order graph: M0 (0x7b6000040438) => M1 (0x7b4400090f70) => M2 (0x7b6400012288) => M3 (0x7b4c000d0970) => M0
[ts-3]
[ts-3] Mutex M1 acquired here while holding mutex M0 in thread T281:
[ts-3] #0 AnnotateRWLockAcquired /opt/yb-build/llvm/yb-llvm-v15.0.3-yb-1-1667341687-0b8d1183-centos7-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cpp:190:3 (yb-tserver+0xd05c9)
[ts-3] #1 yb::rw_spinlock::lock() /net/dev-server-amitanand2/share/code/yugabyte-db-2022/build/tsan-clang15-dynamic-ninja/../../src/yb/util/locks.h:129:5 (libyb_client.so+0x3c45db)
[ts-3] #2 std::__1::lock_guardyb::rw_spinlock::lock_guardabi:v15003 /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230207052503-9dadefc8ee-centos7-x86_64-clang15/installed/tsan/libcxx/include/c++/v1/__mutex_base:94:27 (libyb_client.so+ 0x3c45db)
[ts-3] #3 yb::client::internal::RemoteTablet::MarkReplicaFailed(yb::client::internal::RemoteTabletServer*, yb::Status const&) /net/dev-server-amitanand2/share/code/yugabyte-db-2022/build/tsan-clang15-dynamic-ninja/../../src/yb/client/meta_cache.cc:393:32 (libyb_client.so+0x3c45db)
[ts-3] #4 yb::client::internal::TabletInvoker::FailToNewReplica(yb::Status const&, yb::tserver::TabletServerErrorPB const*) /net/dev-server-amitanand2/share/code/yugabyte-db-2022/build/tsan-clang15-dynamic-ninja/../../src/yb/client/tablet_rpc.cc:302:39 (libyb_client.so+0x434b4c)
[ts-3] #5 yb::client::internal::TabletInvoker::Done(yb::Status*) /net/dev-server-amitanand2/share/code/yugabyte-db-2022/build/tsan-clang15-dynamic-ninja/../../src/yb/client/tablet_rpc.cc:424:15 (libyb_client.so+0x435d28)
[ts-3] #6 yb::client::(anonymous namespace)::TransactionRpcBase::Finished(yb::Status const&) /net/dev-server-amitanand2/share/code/yugabyte-db-2022/build/tsan-clang15-dynamic-ninja/../../src/yb/client/transaction_rpc.cc:65:18 (libyb_client.so+0x48375e)
[ts-3] #7 decltype(std::declval<yb::client::(anonymous namespace)::TransactionRpcBase&>().std::declval<void (yb::client::(anonymous namespace)::TransactionRpcBase::&)(yb::Status const&)>()(std::declvalyb::Status::OK&())) std::__1::__invoke[abi:v15003]<void

@amitanandaiyer amitanandaiyer added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Feb 16, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Feb 16, 2023
@amitanandaiyer
Copy link
Contributor Author

2.log.gz

@bmatican
Copy link
Contributor

@amitanandaiyer Is this a duplicate of #16006

@amitanandaiyer
Copy link
Contributor Author

19:34 $ zgrep -A 3 ThreadSanitizer  /net/dev-server-amitanand2/share/logs/repeat_unit_test/cassandra_cpp_driver-test/CppCassandraDriverTest.ConcurrentIndexUpdate/2023-02-16T19_24_38/*.log.gz | grep Cycle
/net/dev-server-amitanand2/share/logs/repeat_unit_test/cassandra_cpp_driver-test/CppCassandraDriverTest.ConcurrentIndexUpdate/2023-02-16T19_24_38/2.log.gz:[ts-3]   Cycle in lock order graph: M0 (0x7b6000040438) => M1 (0x7b4400090f70) => M2 (0x7b6400012288) => M3 (0x7b4c000d0970) => M0
/net/dev-server-amitanand2/share/logs/repeat_unit_test/cassandra_cpp_driver-test/CppCassandraDriverTest.ConcurrentIndexUpdate/2023-02-16T19_24_38/2.log.gz:[ts-3]   Cycle in lock order graph: M0 (0x7b6400012288) => M1 (0x7b4c000d0970) => M2 (0x7b4400090f70) => M0
/net/dev-server-amitanand2/share/logs/repeat_unit_test/cassandra_cpp_driver-test/CppCassandraDriverTest.ConcurrentIndexUpdate/2023-02-16T19_24_38/3.log.gz:[ts-3]   Cycle in lock order graph: M0 (0x7b4c00150c70) => M1 (0x7b60000c0c38) => M2 (0x7b4400080670) => M3 (0x7b6400012288) => M0
/net/dev-server-amitanand2/share/logs/repeat_unit_test/cassandra_cpp_driver-test/CppCassandraDriverTest.ConcurrentIndexUpdate/2023-02-16T19_24_38/4.log.gz:[ts-3]   Cycle in lock order graph: M0 (0x7b6400012288) => M1 (0x7b4c000c09f0) => M2 (0x7b440008a7b0) => M0
/net/dev-server-amitanand2/share/logs/repeat_unit_test/cassandra_cpp_driver-test/CppCassandraDriverTest.ConcurrentIndexUpdate/2023-02-16T19_24_38/4.log.gz:[ts-1]   Cycle in lock order graph: M0 (0x7b6400012288) => M1 (0x7b4c00150ab0) => M2 (0x7b4400081f70) => M0
/net/dev-server-amitanand2/share/logs/repeat_unit_test/cassandra_cpp_driver-test/CppCassandraDriverTest.ConcurrentIndexUpdate/2023-02-16T19_24_38/4.log.gz:[ts-3]   Cycle in lock order graph: M0 (0x7b6400012288) => M1 (0x7b4c000c09f0) => M2 (0x7b440008a7b0) => M0
/net/dev-server-amitanand2/share/logs/repeat_unit_test/cassandra_cpp_driver-test/CppCassandraDriverTest.ConcurrentIndexUpdate/2023-02-16T19_24_38/4.log.gz:[ts-2]   Cycle in lock order graph: M0 (0x7b6400012288) => M1 (0x7b4c000e0ab0) => M2 (0x7b440008b430) => M0
/net/dev-server-amitanand2/share/logs/repeat_unit_test/cassandra_cpp_driver-test/CppCassandraDriverTest.ConcurrentIndexUpdate/2023-02-16T19_24_38/5.log.gz:[ts-3]   Cycle in lock order graph: M0 (0x7b6400012288) => M1 (0x7b4c001309f0) => M2 (0x7b4400080530) => M0
/net/dev-server-amitanand2/share/logs/repeat_unit_test/cassandra_cpp_driver-test/CppCassandraDriverTest.ConcurrentIndexUpdate/2023-02-16T19_24_38/6.log.gz:[ts-3]   Cycle in lock order graph: M0 (0x7b6400012288) => M1 (0x7b4c000c09f0) => M2 (0x7b440017a030) => M0
/net/dev-server-amitanand2/share/logs/repeat_unit_test/cassandra_cpp_driver-test/CppCassandraDriverTest.ConcurrentIndexUpdate/2023-02-16T19_24_38/6.log.gz:[ts-1]   Cycle in lock order graph: M0 (0x7b6400012288) => M1 (0x7b4c00150ab0) => M2 (0x7b4400179c70) => M0
/net/dev-server-amitanand2/share/logs/repeat_unit_test/cassandra_cpp_driver-test/CppCassandraDriverTest.ConcurrentIndexUpdate/2023-02-16T19_24_38/6.log.gz:[ts-1]   Cycle in lock order graph: M0 (0x7b6400012288) => M1 (0x7b4c00150ab0) => M2 (0x7b4400179c70) => M0

@amitanandaiyer
Copy link
Contributor Author

@bmatican I don't think so. I see YBTransaction/TransactionManager ... not in the other issue.

@bmatican
Copy link
Contributor

Hmm, first stack in the linked issue is

[ts-3]   Mutex M1 acquired here while holding mutex M0 in thread T85:
11853:[ts-3]     #0 AnnotateRWLockAcquired /opt/yb-build/llvm/yb-llvm-v15.0.3-yb-1-1667341687-0b8d1183-centos7-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interface_ann.cpp:190:3 (yb-tserver+0xd05c9)
11856:[ts-3]     #1 yb::rw_spinlock::lock() ${BUILD_ROOT}/../../src/yb/util/locks.h:129:5 (libyb_client.so+0x3bf407)
11858:[ts-3]     #2 std::lock_guard<yb::rw_spinlock>::lock_guard[abi:v15003](yb::rw_spinlock&) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20221115221900-32d6b89b02-centos7-x86_64-clang15/installed/tsan/libcxx/include/c++/v1/__mutex_base:94:27 (libyb_client.so+0x3bf407)
11860:[ts-3]     #3 yb::client::internal::RemoteTablet::MarkTServerAsLeader(yb::client::internal::RemoteTabletServer const*) ${BUILD_ROOT}/../../src/yb/client/meta_cache.cc:584:32 (libyb_client.so+0x3bf407)
11861:[ts-3]     #4 yb::client::internal::TabletInvoker::Done(yb::Status*) ${BUILD_ROOT}/../../src/yb/client/tablet_rpc.cc:464:30 (libyb_client.so+0x42e6d1)
11862:[ts-3]     #5 yb::client::(anonymous namespace)::TransactionRpcBase::Finished(yb::Status const&) ${BUILD_ROOT}/../../src/yb/client/transaction_rpc.cc:65:18 (libyb_client.so+0x47b29e)

@amitanandaiyer
Copy link
Contributor Author

they could be related. But the other issue involves 3 locks. there are 4 locks involved here.

@yugabyte-ci yugabyte-ci added priority/critical Critical issue and removed priority/medium Medium priority issue labels Feb 24, 2023
@bmatican
Copy link
Contributor

@ttyusupov I passed you this one as well, as I assume it's related to the linked issue -- if you can confirm

@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Feb 24, 2023
@ttyusupov
Copy link
Contributor

Should be fixed by e090e32 that resolves #16006

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/critical Critical issue
Projects
None yet
Development

No branches or pull requests

4 participants