Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary locking and serverlist syncing in heartbeats #5654

Merged
merged 3 commits into from
May 8, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 23 additions & 8 deletions client/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,9 @@ func NewClient(cfg *config.Config, consulCatalog consul.CatalogAPI, consulServic
// Initialize the server manager
c.servers = servers.New(c.logger, c.shutdownCh, c)

// Start server manager rebalancing go routine
go c.servers.Start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious how this call was dropped? Was it a regression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it was ever there so for long running clients only failed rpcs or discovering new servers would cause reshuffling.


// Initialize the client
if err := c.init(); err != nil {
return nil, fmt.Errorf("failed to initialize client: %v", err)
Expand Down Expand Up @@ -1345,7 +1348,6 @@ func (c *Client) registerAndHeartbeat() {
case <-c.shutdownCh:
return
}

if err := c.updateNodeStatus(); err != nil {
// The servers have changed such that this node has not been
// registered before
Expand Down Expand Up @@ -2342,13 +2344,6 @@ func (c *Client) consulDiscovery() {
func (c *Client) consulDiscoveryImpl() error {
consulLogger := c.logger.Named("consul")

// Acquire heartbeat lock to prevent heartbeat from running
// concurrently with discovery. Concurrent execution is safe, however
// discovery is usually triggered when heartbeating has failed so
// there's no point in allowing it.
c.heartbeatLock.Lock()
defer c.heartbeatLock.Unlock()

dcs, err := c.consulCatalog.Datacenters()
if err != nil {
return fmt.Errorf("client.consul: unable to query Consul datacenters: %v", err)
Expand Down Expand Up @@ -2432,6 +2427,26 @@ DISCOLOOP:

consulLogger.Info("discovered following servers", "servers", nomadServers)

// Check if the list of servers discovered is identical to the list we already have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think this code would be better placed in servers.SetServers. That way it could be done inside the server manager's lock and prevent concurrent RPCs from rotating the list and making this comparison stale.

Not a big deal as there's no data race here, just the possibility of a comparison against stale data which shouldn't do much harm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It would be easily unit testable in server manager as well)

Copy link
Contributor Author

@preetapan preetapan May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method for comparing whether the two server lists are equal in servers.SetServers does a sort -

m.servers.Sort()

This approach avoids that sort step/modification. I can move over the code though and change the logic in SerServers to avoid the sort of existing servers, which modifies the observable server list.

// If so, we don't need to reset the server list unnecessarily
knownServers := make(map[string]struct{})
serverList := c.servers.GetServers()
for _, s := range serverList {
knownServers[s.Addr.String()] = struct{}{}
}

allFound := true
for _, s := range nomadServers {
_, known := knownServers[s.Addr.String()]
if !known {
allFound = false
break
}
}
if allFound && len(nomadServers) == len(serverList) {
c.logger.Info("Not replacing server list, current server list is identical to servers discovered in Consul")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug log level at most. This should only matter when debugging heartbeating which is hopefully very rare!

Suggested change
c.logger.Info("Not replacing server list, current server list is identical to servers discovered in Consul")
c.logger.Debug("Not replacing server list, current server list is identical to servers discovered in Consul")

return nil
}
// Fire the retry trigger if we have updated the set of servers.
if c.servers.SetServers(nomadServers) {
// Start rebalancing
Expand Down
3 changes: 3 additions & 0 deletions dev/cluster/client1.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ data_dir = "/tmp/client1"
# Give the agent a unique name. Defaults to hostname
name = "client1"

# Enable debugging
enable_debug = true

# Enable the client
client {
enabled = true
Expand Down
3 changes: 3 additions & 0 deletions dev/cluster/client2.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ data_dir = "/tmp/client2"
# Give the agent a unique name. Defaults to hostname
name = "client2"

# Enable debugging
enable_debug = true
preetapan marked this conversation as resolved.
Show resolved Hide resolved

# Enable the client
client {
enabled = true
Expand Down
2 changes: 1 addition & 1 deletion dev/cluster/server1.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@ server {
}

# Self-elect, should be 3 or 5 for production
bootstrap_expect = 3
bootstrap_expect = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This the dev cluster config, so bootstrap expect should match server2 and server3's.

You can override the value on the command line to use this config for a single server cluster or we could add a new config for single server clusters.

}