Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoadBalance] make BeLoadRebalancer extends from base class Rebalancer #4771

Merged
merged 11 commits into from
Nov 3, 2020

Conversation

vagetablechicken
Copy link
Contributor

@vagetablechicken vagetablechicken commented Oct 20, 2020

Proposed changes

Create a base class Rebalancer, support to delete the specified replica(actually add a new priority in handling redundant replica).
The origin LoadBalancer, which will named as BeLoadRebalancer, just need to extend from Rebalancer without any logic change.

Types of changes

  • Code refactor (Modify the code structure, format the code, etc...)

Checklist

@kangkaisen
Copy link
Contributor

In Apache Druid. https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/server/coordinator/BalancerStrategy.java

pickSegmentToMove will choose which tablet replica to balance, naturally, the BE for the tablet replica is the src backend.

findNewSegmentHomeBalancer will choose the dest backend.

I think the Apache Druid BalancerStrategy interface logic and naming is clear.

@vagetablechicken
Copy link
Contributor Author

How to choose the dest be is core for balance strategy, if the base class doesn't limit it, what's the meaning of balance base class.

It's based on the specific strategies. Some strategies can delay choosing the dest be until create a clone task, but some
strategies can't . So we don't overemphasize this.
The meaning of balance base class is to support delete the src replica, which the current class can't do.

In Apache Druid. https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/server/coordinator/BalancerStrategy.java

pickSegmentToMove will choose which tablet replica to balance, naturally, the BE for the tablet replica is the src backend.

findNewSegmentHomeBalancer will choose the dest backend.

I think the Apache Druid BalancerStrategy interface logic and naming is clear.

We want to import the kudu's rebalance strategy, it'll calculate a concrete move{partition id, src be, dest be}, so it's no need to separate into choose one tablet and src be & find dest be.

In Doris, rebalance has 2 steps:

  1. select tablet, and clone a replica on dest be, so the tablet is redundant
  2. delete redundant replica

The key point is, step 2 will generate another TabletSchedCtx , we can't know which replica is the replica we want to delete. So we should try to get it from rebalancer.

@kangkaisen
Copy link
Contributor

@vagetablechicken Hi, I see what's your mean.

I agree with your interface logic. But for the method naming, I think we would better rename it.

For selectAlternativeTabletsForCluster method, the nameing or comment should let other people know we
select {tablet id, src be, dest be} at one time in this method.

For getSrcReplicaId, rename it to getDeleteReplicaId ?

@morningman
Copy link
Contributor

I have a question about the problem described in #4763. If we implement a new LoadBalancer that allow users to specify a tablet to migrate from node A to node B. But after the migration is completed, how to ensure that this tablet will not be migrated back again by the default balance strategy?

@morningman morningman added area/balance Issues or PRs related to data balance kind/improvement labels Oct 31, 2020
@kangkaisen
Copy link
Contributor

I have a question about the problem described in #4763. If we implement a new LoadBalancer that allow users to specify a tablet to migrate from node A to node B. But after the migration is completed, how to ensure that this tablet will not be migrated back again by the default balance strategy?

I think in one cluster, we could only enable one balance strategy?

@vagetablechicken
Copy link
Contributor Author

I have a question about the problem described in #4763. If we implement a new LoadBalancer that allow users to specify a tablet to migrate from node A to node B. But after the migration is completed, how to ensure that this tablet will not be migrated back again by the default balance strategy?

I think in one cluster, we could only enable one balance strategy?

Yes, only one.

Copy link
Contributor

@kangkaisen kangkaisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kangkaisen kangkaisen added the approved Indicates a PR has been approved by one committer. label Nov 2, 2020
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kangkaisen kangkaisen merged commit 80d5f6e into apache:master Nov 3, 2020
@yangzhg yangzhg mentioned this pull request Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. area/balance Issues or PRs related to data balance kind/improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] refactor the LoadBalancer to support more rebalance strategies
4 participants