-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: increase RaftTickInterval
from 200 ms to 500 ms
#98584
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
erikgrinaker
force-pushed
the
raft-tick-interval
branch
from
March 14, 2023 15:14
6f44dbe
to
e6cdbb7
Compare
I haven't done any testing here beyond benchmarking, will see how the nightlies fare. |
erikgrinaker
changed the title
base: increase
base: increase Mar 14, 2023
RaftTickInteval
from 200 ms to 500 msRaftTickInterval
from 200 ms to 500 ms
erikgrinaker
force-pushed
the
raft-tick-interval
branch
from
March 14, 2023 17:21
e6cdbb7
to
3f75db6
Compare
tbg
approved these changes
Mar 15, 2023
erikgrinaker
force-pushed
the
raft-tick-interval
branch
from
March 15, 2023 11:59
3f75db6
to
ed5bb53
Compare
Expressing this parameter in Raft ticks was just confusing, and changing the Raft tick interval will inadvertently change this value. It had no functional dependence on Raft ticks. The wall-time value remains roughly the same. Epic: none Release note: None
Tick costs for unquiesced ranges can use a large amount of CPU on nodes with many replicas. Increasing the tick interval from 200 ms to 500 ms reduces this CPU cost by 60%. On a 3-node cluster with 50.000 unquiesced ranges, this reduced the total CPU usage when idle from 54% to 32%. All derived intervals and timeouts have been adjusted such that they remain the same in wall time. This increases the latency (from 200 to 500 ms) for tick-driven actions: * Transfers of Raft leadership to leaseholders. * Follower overload pausing. * Updating the node liveness map. * Updating the IO thresholds map. Furthermore, because it reduces the resolution of the randomized Raft election timeout interval from [10-20) ticks to [4-8) ticks, it increases the chance of collisions and thus the chance of unsuccessful elections. Environment variables have been added to adjust this and any tick-dependant values at runtime in case problems arise. Epic: none Release note (performance improvement): The Raft tick interval has been increased from 200 ms to 500 ms in order to reduce per-replica CPU costs, and can now be adjusted via `COCKROACH_RAFT_TICK_INTERVAL`. Dependant parameters such as the Raft election timeout (`COCKROACH_RAFT_ELECTION_TIMEOUT_TICKS`), reproposal timeout (`COCKROACH_RAFT_REPROPOSAL_TIMEOUT_TICKS`), and heartbeat interval (`COCKROACH_RAFT_HEARTBEAT_INTERVAL_TICKS`) have been adjusted such that their wall-time value remains the same.
erikgrinaker
force-pushed
the
raft-tick-interval
branch
from
March 15, 2023 20:01
ed5bb53
to
5e6698e
Compare
bors r+ |
Build succeeded: |
This was referenced Mar 16, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
base: don't express
RaftDelaySplitToSuppressSnapshot
in ticksExpressing this parameter in Raft ticks was just confusing, and changing the Raft tick interval will inadvertently change this value. It had no functional dependence on Raft ticks.
The wall-time value remains roughly the same.
Epic: none
Release note: None
base: increase
RaftTickInterval
from 200 ms to 500 msTick costs for unquiesced ranges can use a large amount of CPU on nodes with many replicas. Increasing the tick interval from 200 ms to 500 ms reduces this CPU cost by 60%. On a 3-node cluster with 50.000 unquiesced ranges, this reduced the total CPU usage when idle from 54% to 32%.
All derived intervals and timeouts have been adjusted such that they remain the same in wall time.
This increases the latency (from 200 to 500 ms) for tick-driven actions:
Furthermore, because it reduces the resolution of the randomized Raft election timeout interval from [10-20) ticks to [4-8) ticks, it increases the chance of collisions and thus the chance of unsuccessful elections.
Environment variables have been added to adjust this and any tick-dependant values at runtime in case problems arise.
Epic: none
Release note (performance improvement): The Raft tick interval has been increased from 200 ms to 500 ms in order to reduce per-replica CPU costs, and can now be adjusted via
COCKROACH_RAFT_TICK_INTERVAL
. Dependant parameters such as the Raft election timeout (COCKROACH_RAFT_ELECTION_TIMEOUT_TICKS
), reproposal timeout (COCKROACH_RAFT_REPROPOSAL_TIMEOUT_TICKS
), and heartbeat interval (COCKROACH_RAFT_HEARTBEAT_INTERVAL_TICKS
) have been adjusted such that their wall-time value remains the same.