-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allocator: lower default IO overload threshold #113667
Conversation
The IO overload threshold determines whether a store will be excluded as a target for rebalancing replicas, or leases onto it. The previous thresholds were `0.8` for replica rebalancing (`kv.allocator.replica_io_overload_threshold`) and `0.5` for lease transfers (`kv.allocator.lease_io_overload_threshold`). The previous settings were selected when Admission Control (AC) attempted to stabilize IO overload at a threshold of 1.0. AC will now typically stabilize a store to a threshold of 0.5. Lower the thresholds to `0.4` for replica rebalancing and `0.3` for lease transfers. Note that a store needs to both exceed the absolute threshold mentioned above, as well as be greater than `+10% * mean` of equivalent rebalancing candidates. Resolves: cockroachdb#112497 Release note: None
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
I think we should hold off on this until we have a little more experience with real clusters with the current settings. I'm concerned that lowering them too much could be a problem. That said, I think the defaults are probably too high regardless of this. if we were only taking into account the number of files (and not the sublevels) - I would be even more in favor of dropping this now. |
On principle the change is logical, however I agree we should perform additional testing before making a call. I believe merging this change into master, but not backporting could provide some coverage to that end. My concern is, if AC is already shaping normal priority traffic at 0.5, then blocking rebalancing at a higher value (0.8) is illogical, since the store may never arrive at 0.8 -- but rebalancing additional replicas onto this store would contribute negatively. The downside is there is additional churn, as candidates are more eagerly excluded, but later are rebalanced towards. The wasted snapshots are unfortunate, but not that big of an issue. If the snapshots themselves are enough to push over a store's LSM, you could imagine that some sort of metastable thrashing could occur. I haven't tried testing this out with a likely workload, and it might just be hot air. In any case, I'll let this PR sit for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @kvoli)
bors r=sumeerbhola |
Build succeeded: |
blathers backport 23.2 |
transfers and replica rebalancing to `0.3` and `0.4` respectively. By the same rationale in cockroachdb#113667, lower the default threshold for shedding leases based on a store's IO overload from `0.9` to `0.5`. Note lease shedding on IO overload is not enabled by default. Epic: none Release note: None
transfers and replica rebalancing to `0.3` and `0.4` respectively. By the same rationale in cockroachdb#113667, lower the default threshold for shedding leases based on a store's IO overload from `0.9` to `0.5`. Note lease shedding on IO overload is not enabled by default. Epic: none Release note: None
117703: allocatorimpl: lower default io overload lease shed threshold to 0.5 r=andrewbaptist a=kvoli transfers and replica rebalancing to `0.3` and `0.4` respectively. By the same rationale in #113667, lower the default threshold for shedding leases based on a store's IO overload from `0.9` to `0.5`. Note lease shedding on IO overload is not enabled by default. Epic: None Release note: None 118249: roachtest: deflake upgrade_skip_version r=rafiss a=rafiss Give more time for the upgrade to complete, since it seems to take a while in CI. This is done by adding an option to customize the retry duration. fixes #118153 Release note: None Co-authored-by: Austen McClernon <austen@cockroachlabs.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
The IO overload threshold determines whether a store will be excluded as a target for rebalancing replicas, or leases onto it. The previous thresholds were
0.8
for replica rebalancing(
kv.allocator.replica_io_overload_threshold
) and0.5
for lease transfers (kv.allocator.lease_io_overload_threshold
).The previous settings were selected when Admission Control (AC) attempted to stabilize IO overload at a threshold of 1.0. AC will now typically stabilize a store to a threshold of 0.5.
Lower the thresholds to
0.4
for replica rebalancing and0.3
for lease transfers. Note that a store needs to both exceed the absolute threshold mentioned above, as well as be greater than+10% * mean
of equivalent rebalancing candidates.Resolves: #112497
Release note: None