Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug that cluster balance may cause load job failed #1581

Merged
merged 5 commits into from
Aug 8, 2019

Conversation

morningman
Copy link
Contributor

@morningman morningman commented Aug 4, 2019

The bug is described in issue #1580 . And this patch will fix 2 cases of cluster balance

  1. After finish adding the new replica, the new replica's version may not catch up with
    the visible version, so the new replica may be treated as a stale and redundant replica, which
    will be deleting at next tablet checking round.

    I add a mark named needFurtherRepair to the newly added replica, only mark it when that replica's version does not catch up with visible version. This replica will receive a further repair at next tablet checking round, instead of being deleted.

  2. When deleting the redundant replicas, there may be some load jobs on it. Delete these replicas may cause the load job fail.

    Before deleting a redundant replica, I first mark the next txn id on that replica, and set replica's
    state to CLONE. The CLONE state will ensure that no more load jobs will be on that replica, and we
    will wait all load jobs before the marked txn id to be finished. After that, the replica can be deleted safely.

ISSUE: #1580

@morningman morningman merged commit 69de5df into apache:master Aug 8, 2019
@imay imay mentioned this pull request Sep 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants