Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential issue on crash after creating post-split tablets #8148

Closed
ttyusupov opened this issue Apr 23, 2021 · 1 comment
Closed

Potential issue on crash after creating post-split tablets #8148

ttyusupov opened this issue Apr 23, 2021 · 1 comment
Assignees
Labels
kind/bug This issue is a bug priority/high High Priority

Comments

@ttyusupov
Copy link
Contributor

ttyusupov commented Apr 23, 2021

Scenario:

  1. TServer sends to leader master split_key_1 + tablet_id to split.
  2. CatalogManager::DoSplitTablet is called on master side
  3. Child tablets are registered with partitions based on split_key_1.
  4. Leader master crashes.
  5. We have a new leader master.
  6. TServer decides again that tablet_id requires split, but this time it calculates different split_key_2 (it can happen if we have workload running and distribution of keys changes between step 1 and this step).
  7. CatalogManager::DoSplitTablet is called on master side with split_key_2 + tablet_id.
  8. CatalogManager::DoSplitTablet doesn't create new child tablets, because they are already created.
  9. CatalogManager::DoSplitTablet still calls SendSplitTabletRequest, but passing encoded and partition keys for split_key_2 instead of split_key_1.

Expected result: Child tablets partition key boundary should match the split key that is used by tserver to do the actual tablet split.
Actual result: Child tablets will have partition key boundary based on split_key_1, but tserver will do the actual split based on split_key_2.

@ttyusupov ttyusupov added the kind/bug This issue is a bug label Apr 23, 2021
@ttyusupov ttyusupov added the priority/high High Priority label Jul 30, 2021
@robertsami robertsami assigned robertsami and unassigned hulien22 Sep 20, 2021
nimwijetunga added a commit that referenced this issue Oct 18, 2021
…lets

Summary: We may create two tablet partitions with different split partition keys (upon a split request) if the master node crashes.

Test Plan:
Uses an integration test that:

  # Creates a single tablet and attempts to split it (which should fail because of a flag set in the test which crashes the master)
  # Tries to re-split the same tablet again (should succeed since the master does not crash)
  # Checks that once the split is complete the boundary is consistent between the two split tablets

To run the test use the following command:

```
./yb_build.sh -n 10 --cxx-test integration-tests_tablet-split-itest --gtest_filter TabletSplitExternalMiniClusterITest.CrashMasterCheckConsistentPartitionKeys
```

Reviewers: timur, rsami

Reviewed By: timur, rsami

Subscribers: bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D13359
@nimwijetunga
Copy link
Contributor

Closed by commit: f9c8d2f

jaki added a commit that referenced this issue Nov 2, 2021
…tore (2)

Summary:
Commit b14485a handles backup restore
for YSQL when number of tablets in the external snapshot doesn't match
that in the cluster.  Do part 2 of handling cases where the number of
tablets match but the partition boundaries don't.  This may cost some
performance because partitions need to be inspected for each YSQL table.

Master branch commit f9c8d2f ([#8148]
docdb: Potential issue on crash after creating post-split tablets) is
not in this backport 2.8 branch.  To clarify what that commit does, it
changes the way yb-master chooses the split point: rather than take the
midway point between start and end, use the middle based on data in the
tablet.  Therefore, ManualTabletSplit test is adjusted so that the split
point is taken as the middle disregarding data.

Original Commit: 96beb9e

Original Differential Revision: https://phabricator.dev.yugabyte.com/D13300

Test Plan:
New test:

    ./yb_build.sh --cxx-test tools_yb-backup-test_ent \
      --gtest_filter YBBackupTest.TestYSQLManualTabletSplit

Run other YSQL backup/restore tests, and make sure that partitions don't
appear to mismatch (i.e. make sure master log doesn't have "Partition
boundaries mismatch for table").

Reviewers: oleg

Reviewed By: oleg

Subscribers: bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D13731
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This issue is a bug priority/high High Priority
Projects
None yet
Development

No branches or pull requests

4 participants