-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds performance tuning capability for Raft, detuned defaults, and supplemental docs. #2303
Changes from 3 commits
57db4bc
c432aa5
679b3c0
2822334
17b70c7
80d1d88
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,10 @@ const ( | |
DefaultDC = "dc1" | ||
DefaultLANSerfPort = 8301 | ||
DefaultWANSerfPort = 8302 | ||
|
||
// See docs/guides/performance.html for information on how this value | ||
// was obtained. | ||
DefaultRaftMultiplier uint = 5 | ||
) | ||
|
||
var ( | ||
|
@@ -314,8 +318,11 @@ func DefaultConfig() *Config { | |
CoordinateUpdateBatchSize: 128, | ||
CoordinateUpdateMaxBatches: 5, | ||
|
||
// Hold an RPC for up to 5 seconds by default | ||
RPCHoldTimeout: 5 * time.Second, | ||
// This holds RPCs during leader elections. For the default Raft | ||
// config the election timeout is 5 seconds, so we set this a | ||
// bit longer to try to cover that period. This should be more | ||
// than enough when running in the high performance mode. | ||
RPCHoldTimeout: 7 * time.Second, | ||
} | ||
|
||
// Increase our reap interval to 3 days instead of 24h. | ||
|
@@ -333,13 +340,28 @@ func DefaultConfig() *Config { | |
// Enable interoperability with unversioned Raft library, and don't | ||
// start using new ID-based features yet. | ||
conf.RaftConfig.ProtocolVersion = 1 | ||
conf.ScaleRaft(DefaultRaftMultiplier) | ||
|
||
// Disable shutdown on removal | ||
conf.RaftConfig.ShutdownOnRemove = false | ||
|
||
return conf | ||
} | ||
|
||
// ScaleRaft sets the config to have Raft timing parameters scaled by the given | ||
// performance multiplier. This is done in an idempotent way so it's not tricky | ||
// to call this when composing configurations and potentially calling this | ||
// multiple times on the same structure. | ||
func (c *Config) ScaleRaft(raftMultRaw uint) { | ||
raftMult := time.Duration(raftMultRaw) | ||
|
||
def := raft.DefaultConfig() | ||
c.RaftConfig.HeartbeatTimeout = raftMult * def.HeartbeatTimeout | ||
c.RaftConfig.ElectionTimeout = raftMult * def.ElectionTimeout | ||
c.RaftConfig.CommitTimeout = raftMult * def.CommitTimeout | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we want to scale this? This won't affect stability but affects the commit tail latency on followers There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh good catch - I'll remove this. |
||
c.RaftConfig.LeaderLeaseTimeout = raftMult * def.LeaderLeaseTimeout | ||
} | ||
|
||
func (c *Config) tlsConfig() *tlsutil.Config { | ||
tlsConf := &tlsutil.Config{ | ||
VerifyIncoming: c.VerifyIncoming, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -820,11 +820,12 @@ func (s *Server) Stats() map[string]map[string]string { | |
s.remoteLock.RUnlock() | ||
stats := map[string]map[string]string{ | ||
"consul": map[string]string{ | ||
"server": "true", | ||
"leader": fmt.Sprintf("%v", s.IsLeader()), | ||
"leader_addr": string(s.raft.Leader()), | ||
"bootstrap": fmt.Sprintf("%v", s.config.Bootstrap), | ||
"known_datacenters": toString(uint64(numKnownDCs)), | ||
"server": "true", | ||
"leader": fmt.Sprintf("%v", s.IsLeader()), | ||
"leader_addr": string(s.raft.Leader()), | ||
"bootstrap": fmt.Sprintf("%v", s.config.Bootstrap), | ||
"known_datacenters": toString(uint64(numKnownDCs)), | ||
"leader_lease_timeout": fmt.Sprintf("%v", s.config.RaftConfig.LeaderLeaseTimeout), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems odd to expose this since it is static? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah yeah I was testing and using this as a sanity check but I'll remove it. |
||
}, | ||
"raft": s.raft.Stats(), | ||
"serf_lan": s.serfLAN.Stats(), | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -576,6 +576,24 @@ Consul will not enable TLS for the HTTP API unless the `https` port has been ass | |
* <a name="node_name"></a><a href="#node_name">`node_name`</a> Equivalent to the | ||
[`-node` command-line flag](#_node). | ||
|
||
* <a name="performance"></a><a href="#performance">`performance`</a> Available in Consul 0.7 and | ||
later, this is a nested object that allows tuning the performance of different subsystems in | ||
Consul. See the [Server Performance](/docs/guides/performance.html) guide for more details. The | ||
following parameters are available: | ||
* <a name="raft_multiplier"></a><a href="#raft_multiplier">`raft_multiplier`</a> - An integer | ||
multiplier used by Consul servers to scale key Raft timing parameters. Tuning this affects | ||
the time it takes Consul to detect leader failures and to perform leader elections, at the | ||
expense of requiring more network and CPU resources for better performance.<br><br>A value | ||
of 0, the default, means that Consul will use a lower-performance timing that's suitable for | ||
[minimal Consul servers](/docs/guides/performance.html#minumum), currently equivalent to | ||
setting this to a value of 5 (this default may be changed in future versions of Consul, | ||
depending if the target minimum server profile changes). Above 0, higher values imply lower | ||
levels of performance. Setting this to a value of 1 will configure Raft to its | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Above 0, higher values imply lower levels of performance." is confusing. Consider rephrasing to "The zero value uses the default, lower values are used to tighten timing and increase sensitivity while higher values relax timings and reduce sensitivity." |
||
highest-performance mode, equivalent to the default timing of Consul prior to 0.7, and is | ||
recommended for [production Consul servers](/docs/guides/performance.html#production). See | ||
the note on [last contact](/docs/guides/performance.html#last-contact) timing for more | ||
details on tuning this parameter. | ||
|
||
* <a name="ports"></a><a href="#ports">`ports`</a> This is a nested object that allows setting | ||
the bind ports for the following keys: | ||
* <a name="dns_port"></a><a href="#dns_port">`dns`</a> - The DNS server, -1 to disable. Default 8600. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a sanity check for a
MaxRaftMultiplier
as well