-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[storage]: add local retention reclaimable to partition health report #12737
[storage]: add local retention reclaimable to partition health report #12737
Conversation
778d4f9
to
4738a9a
Compare
return do_truncate_prefix(cfg) | ||
.then([this] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting there's still room for a race where the reported size is lower than the reclaimable size, I think. Since size_bytes gets updated in do_truncate()
and there is this scheduling point here.
It seems hard to be robust, so it's probably fine leaving this, but with the caveat that partition balancing should sanitize these values before making decisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems hard to be robust, so it's probably fine leaving this, but with the caveat that partition balancing should sanitize these values before making decisions.
yeh. i added a comment to the interface. flagging directly to @ztlpn for visibility.
Seems to have been partially cleaned up in redpanda-data#9121. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
The additional point where data is updated seems to have been overlooked in redpanda-data@26fb548#diff-708b13fb33ad235d5c420e38131e7188daeee60ff6a4e1054ed480a36142ccb2 Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
When reclaimable space is calculated the size above local retention is cached so that it is available to the health report subsystem. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Tests that the reclaimable local size from cloud storage topic is reflected back through health report. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
This happens in several places so factoring it out into a reusable method. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
2888c16
to
fa9fb6d
Compare
@dotnwat should we backport it to 23.2.x, WDYT? |
@ztlpn yes. in the PR description I wrote
I have on my todo list here to ask you about this. Perhaps we should discuss how to proceed on the balancer? I'm also happy to backport this PR now if it helps. |
/backport v23.2.x |
Failed to create a backport PR to v23.2.x branch. I tried:
|
The partition balancer needs to know how much of a partition's size is easily reclaimable. That is, the amount of data that has grown best-effort above local retention threshold. This will be used to adjust the balancer's view of how much free space is available on a node.
Note: this is needs to be backported, but we should wait until we've accumulated changes and tests related to updates of the partition balancer as well. These are being track in https://github.com/redpanda-data/core-internal/issues/719
Fixes: https://github.com/redpanda-data/core-internal/issues/725
Backports Required
Release Notes