-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node-local core assignment: core count decrease #20312
Node-local core assignment: core count decrease #20312
Conversation
4a37528
to
7c48853
Compare
If the number of cores was reduced, we need to have some way to access kvstores for extra cores. Allow constructing kvstore for shard id >= than the number of cores to achieve that.
Since kvstore operations can in theory fail, copying everything and then removing (after copy is fully successful) is better than moving pieces of kvstore state one-by-one (in practice move is still a piecewise copy-then-remove). Second reason: we need separate remove helpers to clean garbage and obsolete kvstore data.
Sometimes a partition should still exist on this node, but its kvstore state is no longer relevant (e.g. it was transferred to a different shard but hadn't been deleted yet). Handle this case in shard_placement_table and controller_backend.
…d transfers Previously if a cross-shard transfer failed, we couldn't really tell on the source shard if we should retry or not (we may have failed to remove obsolete state after a successful transfer, in this case retrying is dangerous). Mark the state on the source shard obsolete immediately after a successful transfer to fix that. Also introduce more detailed failure conditions in prepare_tranfer() - are we waiting for the source or the destination shard? This will come handy when we implement moving data from extra shards because we'll have to clean the destination ourselves.
No functional changes.
Pass the current number of kvstore shards to the start method and move existing partitions on extra shards to one of the current shards if it is possible.
Calculate max allowed number of partition replicas with the new core count and reject core count decrease if total number of partition replicas is greater.
Now that shard_balancer will copy partition data from extra kvstore shards, we can relax the check in validate_configuration_invariants.
7c48853
to
a2a27c6
Compare
@@ -365,6 +370,9 @@ ss::future<> shard_placement_table::initialize_from_kvstore( | |||
[&ntp2init_data](shard_placement_table& spt) { | |||
return spt.scatter_init_data(ntp2init_data); | |||
}); | |||
for (auto& spt : extra_spts) { | |||
co_await spt->scatter_init_data(ntp2init_data); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why we process existing shard data concurrently, but extra shards one by one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason, but scatter_init_data
is CPU-bound, so no benefit in doing it concurrently either.
Implement copying partition data from extra kvstore shards (i.e. kvstore shards with ids >= current shard count) and use it to allow decreasing core count.
Backports Required
Release Notes
Features