-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schedule counts balancing-related partition movements in partition balancer #11366
Conversation
afa0f1d
to
79f3508
Compare
this looks really good, i am wondering if we can remove update finishing logic from the members backend now ? |
We need to get rid of the recommission-related cancellations logic first. |
The merge-base changed after approval.
3fba0c5
to
d6ea177
Compare
We can rely on partition allocator state to detect the case when all necessary node drain actions already have been done or scheduled.
The algorithm is simple: just try to move every replica. Assuming that due to the min-count soft constraint it will end on a node with fewer replicas than the original one, we will move towards (local) optimum. This operator is triggered only if we have non-empty "nodes-to-rebalance" collection in partition_balancer_state. It will be updated by members_manager and will contain all nodes that have been added but haven't yet been issued a corresponoding finish_reallocations command.
We need to try moving partition replicas in random order to avoid creating topic hotspots after counts rebalancing (i.e. when some of the topic partitions are not present on added nodes).
d6ea177
to
c45d4f0
Compare
Rebased on dev and added another change: tightened the way we handle per-term state in the backend. This way it is easier to remember to reset it when the term changes. |
c45d4f0
to
17045c1
Compare
Change in force-push: added controller term check when issuing finish_reallocations commands. |
Term check can help to prevent possible inconsistencies from several rebalancing processes running in different controller terms. Additionally, remove the ability to call this on a non-controller leader node, because this method is not user-facing and only used by rebalancing backends.
Some of the state in the balancer backend becomes obsolete when the term changes. To make it easy to reset all this state in one go, gather it in a single struct.
Counts rebalancing is now handled by partition balancer.
As members_backend now doesn't compute any reassignments related to decommission itself, we can simplify the decommission finish condition and simply use the partition allocator to check that the node is empty.
17045c1
to
0120266
Compare
/ci-repeat 2 |
1 similar comment
/ci-repeat 2 |
Move counts balancing code from
members_backend
to the partition balancer.The algorithm is as follows:
reallocations_finished
command hasn't been issued yet. If this list is not empty, run the balancing operator in the partition balancer plannerreallocations_finished
commands.On-demand rebalancing is moved to the balancer as well.
As during counts-based rebalancing sometimes moves are generated that do not result in improvement, revert API is implemented to discard them.
Backports Required
Release Notes
Improvements