Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[BACKPORT 2.8][#10304] docdb: fix deadlock in ProcessTabletReportBatch
Summary: Since commit b14485a ([#8229] backup: repartition table if needed on YSQL restore), there is a rare deadlock issue between ProcessTabletReportBatch and RepartitionTable. It can be hit when - thread 1 (RepartitionTable): import a YSQL snapshot where the number of tablets for a table mismatch between the cluster and the external snapshot - thread 2 (ProcessTabletReportBatch): process tablets of that table from heartbeat A more in-depth sequence of steps follows: 1. t1: table->LockForWrite (WriteLock) 1. t2: table->LockForRead (ReadLock) 1. t1: tablet->StartMutation (WriteLock) 2. t1: table_lock.Commit (UpgradeToCommitLock; blocks on t2 ReadLock) 3. t2: tablet->LockForWrite (WriteLock; blocks on t1 WriteLock) To fix, for ProcessTabletReportBatch, take table write lock instead of read lock. The table metadata isn't mutated, so this is purely for deadlock avoidance reasons (since only one writer is allowed at a time). Bogdan thinks we should expect table write lock to be taken whenever tablet write lock is taken. Original Commit: cc90f01 Original Differential Revision: https://phabricator.dev.yugabyte.com/D13459 Test Plan: ./yb_build.sh \ --cxx-test tools_yb-backup-test_ent \ --gtest_filter YBBackupTest.TestYSQLChangeDefaultNumTablets \ -n 1000 \ --tp 1 \ fastdebug Reviewers: nicolas Reviewed By: nicolas Subscribers: bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D13730
- Loading branch information