-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Correctly handle table/index splits in backup-restore #8229
Comments
Backed up a table with unique constraint. Restored on a cluster with tserver flag
@OlegLoginov suggests not using syntax to solve this because it's not possible for CREATE EXTENSION (like, what if creating extension involves tables, some which have been dynamically split, some not?). |
When trying to restore to a different database, I get a slightly different error:
This is according to the two cases of |
Summary: Sometimes, ysql_dump may not provide enough partitioning hints. Then, when importing snapshots, an error may be thrown for a mismatch on number of tablets between the snapshot table and ysql_dump-created table. Fix this by recreating partitions for the YSQL table in this case. Do not yet address the case where number of tablets are the same but partitions are split differently. In the implementation, do not worry about waiting for index tablets to finish creating on tservers since that isn't done for YCQL, which already recreates table, either. Add test YBBackupTest.TestYSQLChangeDefaultNumTablets and disabled test YBBackupTest.TestYSQLRangeSplitConstraint. Also, - add a VLOG for AsyncCreateReplica response - fix AddTabletUnlocked thread annotation from REQUIRES_SHARED to REQUIRES - fix ScopedTabletInfoCommitter to avoid double CommitMutation - move ScopedTabletInfoCommitter from anonymous namespace to catalog_entity_info.h so that ent::CatalogManager can use it - make ScopedTabletInfoCommitter a template class ScopedInfoCommitter so that it can later be used for other objects like TableInfo Do not close issue #8229 since there will be a part 2 handling partition schemas that differ on split points but have the same number of partitions. Test Plan: ./yb_build.sh --cxx-test tools_yb-backup-test_ent \ --gtest_filter YBBackupTest.TestYSQLChangeDefaultNumTablets Reviewers: nicolas, oleg Reviewed By: oleg Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D13122
Summary: Since commit b14485a ([#8229] backup: repartition table if needed on YSQL restore), there is a rare deadlock issue between ProcessTabletReportBatch and RepartitionTable. It can be hit when - thread 1 (RepartitionTable): import a YSQL snapshot where the number of tablets for a table mismatch between the cluster and the external snapshot - thread 2 (ProcessTabletReportBatch): process tablets of that table from heartbeat A more in-depth sequence of steps follows: 1. t1: table->LockForWrite (WriteLock) 1. t2: table->LockForRead (ReadLock) 1. t1: tablet->StartMutation (WriteLock) 2. t1: table_lock.Commit (UpgradeToCommitLock; blocks on t2 ReadLock) 3. t2: tablet->LockForWrite (WriteLock; blocks on t1 WriteLock) To fix, for ProcessTabletReportBatch, take table write lock instead of read lock. The table metadata isn't mutated, so this is purely for deadlock avoidance reasons (since only one writer is allowed at a time). Bogdan thinks we should expect table write lock to be taken whenever tablet write lock is taken. Close: #10304 Test Plan: ./yb_build.sh \ --cxx-test tools_yb-backup-test_ent \ --gtest_filter YBBackupTest.TestYSQLChangeDefaultNumTablets \ -n 1000 \ --tp 1 \ fastdebug Reviewers: bogdan, nicolas Reviewed By: nicolas Subscribers: ybase Differential Revision: https://phabricator.dev.yugabyte.com/D13459
Summary: Since commit b14485a ([#8229] backup: repartition table if needed on YSQL restore), there is a rare deadlock issue between ProcessTabletReportBatch and RepartitionTable. It can be hit when - thread 1 (RepartitionTable): import a YSQL snapshot where the number of tablets for a table mismatch between the cluster and the external snapshot - thread 2 (ProcessTabletReportBatch): process tablets of that table from heartbeat A more in-depth sequence of steps follows: 1. t1: table->LockForWrite (WriteLock) 1. t2: table->LockForRead (ReadLock) 1. t1: tablet->StartMutation (WriteLock) 2. t1: table_lock.Commit (UpgradeToCommitLock; blocks on t2 ReadLock) 3. t2: tablet->LockForWrite (WriteLock; blocks on t1 WriteLock) To fix, for ProcessTabletReportBatch, take table write lock instead of read lock. The table metadata isn't mutated, so this is purely for deadlock avoidance reasons (since only one writer is allowed at a time). Bogdan thinks we should expect table write lock to be taken whenever tablet write lock is taken. Original Commit: cc90f01 Original Differential Revision: https://phabricator.dev.yugabyte.com/D13459 Test Plan: ./yb_build.sh \ --cxx-test tools_yb-backup-test_ent \ --gtest_filter YBBackupTest.TestYSQLChangeDefaultNumTablets \ -n 1000 \ --tp 1 \ fastdebug Reviewers: nicolas Reviewed By: nicolas Subscribers: bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D13730
…tore (2) Summary: Commit b14485a handles backup restore for YSQL when number of tablets in the external snapshot doesn't match that in the cluster. Do part 2 of handling cases where the number of tablets match but the partition boundaries don't. This may cost some performance because partitions need to be inspected for each YSQL table. Master branch commit f9c8d2f ([#8148] docdb: Potential issue on crash after creating post-split tablets) is not in this backport 2.8 branch. To clarify what that commit does, it changes the way yb-master chooses the split point: rather than take the midway point between start and end, use the middle based on data in the tablet. Therefore, ManualTabletSplit test is adjusted so that the split point is taken as the middle disregarding data. Original Commit: 96beb9e Original Differential Revision: https://phabricator.dev.yugabyte.com/D13300 Test Plan: New test: ./yb_build.sh --cxx-test tools_yb-backup-test_ent \ --gtest_filter YBBackupTest.TestYSQLManualTabletSplit Run other YSQL backup/restore tests, and make sure that partitions don't appear to mismatch (i.e. make sure master log doesn't have "Partition boundaries mismatch for table"). Reviewers: oleg Reviewed By: oleg Subscribers: bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D13731
…tore Summary: Sometimes, ysql_dump may not provide enough partitioning hints. Then, when importing snapshots, an error may be thrown for a mismatch on number of tablets between the snapshot table and ysql_dump-created table. Fix this by recreating partitions for the YSQL table in this case. Do not yet address the case where number of tablets are the same but partitions are split differently. In the implementation, do not worry about waiting for index tablets to finish creating on tservers since that isn't done for YCQL, which already recreates table, either. Add test YBBackupTest.TestYSQLChangeDefaultNumTablets and disabled test YBBackupTest.TestYSQLRangeSplitConstraint. Also, - add a VLOG for AsyncCreateReplica response - fix AddTabletUnlocked thread annotation from REQUIRES_SHARED to REQUIRES - fix ScopedTabletInfoCommitter to avoid double CommitMutation - move ScopedTabletInfoCommitter from anonymous namespace to catalog_entity_info.h so that ent::CatalogManager can use it - make ScopedTabletInfoCommitter a template class ScopedInfoCommitter so that it can later be used for other objects like TableInfo Do not close issue #8229 since there will be a part 2 handling partition schemas that differ on split points but have the same number of partitions. For backport to 2.6, besides adjusting to API differences, do the following: - Add num_tablets mismatch check since commit 2f9d4dc ([backup] Added a check for 'num_tablets' in CatalogManager::ImportSnapshot.) is not in this backport 2.6 branch. Also add part of commit aa394e8 ([#9933][Part-0] Update logic for using num_tablets from internal or user requests.) touching this because it is needed due to backport commit ac653d2 ([Backport 2.6][#9933][YCQL][Part-1] DESC TABLE does not directly match the "CREATE TABLE" command for number of tablets.). - Change ycql_num_tablets and ysql_num_tablets flags to yb_num_shards_per_tserer and ysql_num_shards_per_tserver in the test since the former flags aren't available in this backport 2.6 branch. Since these flags are only granular to multiples of the number of tservers (3), adjust the later expected number of tablets from 2 to 6. - Add REQUIRES(lock_) to AddTabletUnlocked Original Commit: b14485a Orignal Differential Revision: https://phabricator.dev.yugabyte.com/D13122 Test Plan: ./yb_build.sh --cxx-test client_backup-txn-test ./yb_build.sh --cxx-test integration-tests_snapshot-test ./yb_build.sh --cxx-test tools_yb-admin-test_ent ./yb_build.sh --cxx-test tools_yb-backup-test_ent ./yb_build.sh --java-test org.yb.cql.TestYbBackup Reviewers: oleg, nicolas Reviewed By: nicolas Subscribers: bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D13851
Summary: Since commit b14485a ([#8229] backup: repartition table if needed on YSQL restore), there is a rare deadlock issue between ProcessTabletReportBatch and RepartitionTable. It can be hit when - thread 1 (RepartitionTable): import a YSQL snapshot where the number of tablets for a table mismatch between the cluster and the external snapshot - thread 2 (ProcessTabletReportBatch): process tablets of that table from heartbeat A more in-depth sequence of steps follows: 1. t1: table->LockForWrite (WriteLock) 1. t2: table->LockForRead (ReadLock) 1. t1: tablet->StartMutation (WriteLock) 2. t1: table_lock.Commit (UpgradeToCommitLock; blocks on t2 ReadLock) 3. t2: tablet->LockForWrite (WriteLock; blocks on t1 WriteLock) To fix, for ProcessTabletReportBatch, take table write lock instead of read lock. The table metadata isn't mutated, so this is purely for deadlock avoidance reasons (since only one writer is allowed at a time). Bogdan thinks we should expect table write lock to be taken whenever tablet write lock is taken. For backport to 2.6, since commit afd8775 ([#9182] Fix return from CatalogManager::ProcessTabletReport) is not present, do changes to ProcessTabletReport instead. Depends on D13851 Original Commit: cc90f01 Original Differential Revision: https://phabricator.dev.yugabyte.com/D13459 Test Plan: ./yb_build.sh \ --cxx-test tools_yb-backup-test_ent \ --gtest_filter YBBackupTest.TestYSQLChangeDefaultNumTablets \ -n 1000 \ --tp 1 \ fastdebug Reviewers: nicolas Reviewed By: nicolas Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D13852
When backing up a YSQL database we store both the (YSQL) metadata and the DocDB snapshots.
When restoring we re-create the YSQL objects, and then match the snapshots data for each to restore the actual data.
In that process we currently check that the number of splits of the YSQL objects (tables/indexes) re-created from the metadata match with those from the corresponding snapshot.
To avoid issues there, we added the
SPLIT INTO/AT
clause for ysql_dump which ensures the objects get re-created with the correct number of splits.However that has a few exceptions. Known ones are:
These can lead to mismatch between the tablet splits and therefore an error on restore.
This can be fixed individually by adding splitting information to the YSQL dump (e.g. SPLIT INTO/AT syntax), or generically by inferring the number of splits from the snapshot itself.
For instance, when the number of splits does not match, we could re-create the objects with the correct number of splits from the snapshot rather than erroring out.
The text was updated successfully, but these errors were encountered: