Schedule counts balancing-related partition movements in partition balancer #11366

ztlpn · 2023-06-12T18:41:56Z

Move counts balancing code from members_backend to the partition balancer.

The algorithm is as follows:

Maintain a list of nodes for which reallocations_finished command hasn't been issued yet. If this list is not empty, run the balancing operator in the partition balancer planner
Planner simply goes over all replicas and tries to move them to a more optimal place.
If we haven't added any actions - this means we are at a (local) minimum and rebalancing is finished => we can issue reallocations_finished commands.

On-demand rebalancing is moved to the balancer as well.

As during counts-based rebalancing sometimes moves are generated that do not result in improvement, revert API is implemented to discard them.

Backports Required

Release Notes

Improvements

Partition movements related to rebalancing on node addition now take free disk space and health of target nodes into account.

mmaslankaprv · 2023-06-14T09:03:47Z

this looks really good, i am wondering if we can remove update finishing logic from the members backend now ?

ztlpn · 2023-06-14T11:18:19Z

i am wondering if we can remove update finishing logic from the members backend now ?

We need to get rid of the recommission-related cancellations logic first.

The merge-base changed after approval.

We can rely on partition allocator state to detect the case when all necessary node drain actions already have been done or scheduled.

The algorithm is simple: just try to move every replica. Assuming that due to the min-count soft constraint it will end on a node with fewer replicas than the original one, we will move towards (local) optimum. This operator is triggered only if we have non-empty "nodes-to-rebalance" collection in partition_balancer_state. It will be updated by members_manager and will contain all nodes that have been added but haven't yet been issued a corresponoding finish_reallocations command.

We need to try moving partition replicas in random order to avoid creating topic hotspots after counts rebalancing (i.e. when some of the topic partitions are not present on added nodes).

ztlpn · 2023-06-14T22:43:38Z

Rebased on dev and added another change: tightened the way we handle per-term state in the backend. This way it is easier to remember to reset it when the term changes.

ztlpn · 2023-06-15T11:21:59Z

Change in force-push: added controller term check when issuing finish_reallocations commands.

Term check can help to prevent possible inconsistencies from several rebalancing processes running in different controller terms. Additionally, remove the ability to call this on a non-controller leader node, because this method is not user-facing and only used by rebalancing backends.

Some of the state in the balancer backend becomes obsolete when the term changes. To make it easy to reset all this state in one go, gather it in a single struct.

Counts rebalancing is now handled by partition balancer.

As members_backend now doesn't compute any reassignments related to decommission itself, we can simplify the decommission finish condition and simply use the partition allocator to check that the node is empty.

ztlpn · 2023-06-15T18:32:38Z

test failures (all known)

ztlpn · 2023-06-15T18:41:42Z

/ci-repeat 2
skip-units
dt-repeat=10
tests/rptest/tests/nodes_decommissioning_test.py
tests/rptest/tests/scaling_up_test.py
tests/rptest/tests/partition_balancer_test.py
tests/rptest/tests/random_node_operations_test.py

ztlpn · 2023-06-16T10:45:12Z

/ci-repeat 2
skip-units
dt-repeat=10
tests/rptest/tests/nodes_decommissioning_test.py
tests/rptest/tests/scaling_up_test.py
tests/rptest/tests/partition_balancer_test.py
tests/rptest/tests/random_node_operations_test.py

ztlpn requested review from bharathv and mmaslankaprv June 12, 2023 18:41

github-actions bot added the area/redpanda label Jun 12, 2023

ztlpn force-pushed the pb-counts-rebalancing branch 3 times, most recently from afa0f1d to 79f3508 Compare June 13, 2023 17:32

mmaslankaprv previously approved these changes Jun 14, 2023

View reviewed changes

ztlpn force-pushed the pb-counts-rebalancing branch 2 times, most recently from 3fba0c5 to d6ea177 Compare June 14, 2023 22:39

ztlpn added 8 commits June 15, 2023 01:40

c/partition_balancer: early exit when draining nodes

54c722c

We can rely on partition allocator state to detect the case when all necessary node drain actions already have been done or scheduled.

c/partition_allocator: add allocated_replica reallocation revert API

de68022

c/partition_allocator: add reallocation revert utests

e68552e

c/partition_balancer: add move revert API

fa6f8bb

c/pb_simulator: add counts rebalancing simulation

2cde88a

c/topic_table: expose topics_map_revision for manual checks

f102148

c/partition_balancer: random partition order in counts rebalancing

c8d0315

We need to try moving partition replicas in random order to avoid creating topic hotspots after counts rebalancing (i.e. when some of the topic partitions are not present on added nodes).

ztlpn force-pushed the pb-counts-rebalancing branch from d6ea177 to c45d4f0 Compare June 14, 2023 22:40

ztlpn requested a review from mmaslankaprv June 14, 2023 23:02

ztlpn force-pushed the pb-counts-rebalancing branch from c45d4f0 to 17045c1 Compare June 15, 2023 11:21

ztlpn added 5 commits June 15, 2023 14:38

cluster: perform rebalance on node add from partition balancer

5a885ea

c/partition_balancer: gather per-term state in a single struct

0f9be21

Some of the state in the balancer backend becomes obsolete when the term changes. To make it easy to reset all this state in one go, gather it in a single struct.

c/partition_balancer: add ondemand rebalance

af130f2

c/members_backend: remove legacy rebalancing code

880b18d

Counts rebalancing is now handled by partition balancer.

c/members_backend: simplify decommission finish conditions

0120266

As members_backend now doesn't compute any reassignments related to decommission itself, we can simplify the decommission finish condition and simply use the partition allocator to check that the node is empty.

ztlpn force-pushed the pb-counts-rebalancing branch from 17045c1 to 0120266 Compare June 15, 2023 11:39

mmaslankaprv approved these changes Jun 16, 2023

View reviewed changes

mmaslankaprv merged commit ffcfa7e into redpanda-data:dev Jun 16, 2023

ztlpn mentioned this pull request Jun 22, 2023

cluster: rebalance on node add may not balance optimally when rack aware placement is in use #6058

Closed

ztlpn deleted the pb-counts-rebalancing branch November 27, 2023 13:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schedule counts balancing-related partition movements in partition balancer #11366

Schedule counts balancing-related partition movements in partition balancer #11366

ztlpn commented Jun 12, 2023

mmaslankaprv commented Jun 14, 2023

ztlpn commented Jun 14, 2023

ztlpn commented Jun 14, 2023

ztlpn commented Jun 15, 2023

ztlpn commented Jun 15, 2023 •

edited

Loading

ztlpn commented Jun 15, 2023

ztlpn commented Jun 16, 2023

Schedule counts balancing-related partition movements in partition balancer #11366

Schedule counts balancing-related partition movements in partition balancer #11366

Conversation

ztlpn commented Jun 12, 2023

Backports Required

Release Notes

Improvements

mmaslankaprv commented Jun 14, 2023

ztlpn commented Jun 14, 2023

ztlpn commented Jun 14, 2023

ztlpn commented Jun 15, 2023

ztlpn commented Jun 15, 2023 • edited Loading

ztlpn commented Jun 15, 2023

ztlpn commented Jun 16, 2023

ztlpn commented Jun 15, 2023 •

edited

Loading