Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary locking and serverlist syncing in heartbeats #5654

Merged
merged 3 commits into from
May 8, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 3 additions & 8 deletions client/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,9 @@ func NewClient(cfg *config.Config, consulCatalog consul.CatalogAPI, consulServic
// Initialize the server manager
c.servers = servers.New(c.logger, c.shutdownCh, c)

// Start server manager rebalancing go routine
go c.servers.Start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious how this call was dropped? Was it a regression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it was ever there so for long running clients only failed rpcs or discovering new servers would cause reshuffling.


// Initialize the client
if err := c.init(); err != nil {
return nil, fmt.Errorf("failed to initialize client: %v", err)
Expand Down Expand Up @@ -1345,7 +1348,6 @@ func (c *Client) registerAndHeartbeat() {
case <-c.shutdownCh:
return
}

if err := c.updateNodeStatus(); err != nil {
// The servers have changed such that this node has not been
// registered before
Expand Down Expand Up @@ -2342,13 +2344,6 @@ func (c *Client) consulDiscovery() {
func (c *Client) consulDiscoveryImpl() error {
consulLogger := c.logger.Named("consul")

// Acquire heartbeat lock to prevent heartbeat from running
// concurrently with discovery. Concurrent execution is safe, however
// discovery is usually triggered when heartbeating has failed so
// there's no point in allowing it.
c.heartbeatLock.Lock()
defer c.heartbeatLock.Unlock()

dcs, err := c.consulCatalog.Datacenters()
if err != nil {
return fmt.Errorf("client.consul: unable to query Consul datacenters: %v", err)
Expand Down
31 changes: 26 additions & 5 deletions client/servers/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -201,12 +201,16 @@ func (m *Manager) SetServers(servers Servers) bool {
m.Lock()
defer m.Unlock()

// Sort both the existing and incoming servers
servers.Sort()
m.servers.Sort()

// Determine if they are equal
equal := servers.Equal(m.servers)
equal := m.serversAreEqual(servers)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schmichael this is ready for another look


// If server list is equal don't change the list and return immediately
// This prevents unnecessary shuffling of a failed server that was moved to the
// bottom of the list
if equal {
m.logger.Debug("Not replacing server list, current server list is identical to servers discovered in Consul")
return !equal
}

// Randomize the incoming servers
servers.shuffle()
Expand All @@ -215,6 +219,23 @@ func (m *Manager) SetServers(servers Servers) bool {
return !equal
}

// Method to check if the arg list of servers is equal to the one we already have
func (m *Manager) serversAreEqual(servers Servers) bool {
// We use a copy of the server list here because determining
// equality requires a sort step which modifies the order of the server list
var copy Servers
copy = make([]*Server, 0, len(m.servers))
for _, s := range m.servers {
copy = append(copy, s.Copy())
}

// Sort both the existing and incoming servers
copy.Sort()
servers.Sort()

return copy.Equal(servers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to compare servers lists omitting their DC, only by IP-addresses? Cause servers coming from consulDiscovery() method would be without any DC, whereas current servers would have it.
See https://github.com/hashicorp/nomad/blob/master/client/client.go#L2436

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! As far as I can tell the code to prefer the local DC's server has been broken since v0.8.0. I'm going to remove the DC field and add an issue+TODO.

}

// FindServer returns a server to send an RPC too. If there are no servers, nil
// is returned.
func (m *Manager) FindServer() *Server {
Expand Down
13 changes: 13 additions & 0 deletions client/servers/manager_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,19 @@ func TestServers_SetServers(t *testing.T) {
require.True(m.SetServers([]*servers.Server{s1}))
require.Equal(1, m.NumServers())
require.Len(m.GetServers(), 1)

// Test that the list of servers does not get shuffled
// as a side effect when incoming list is equal
require.True(m.SetServers([]*servers.Server{s1, s2}))
before := m.GetServers()
require.False(m.SetServers([]*servers.Server{s1, s2}))
after := m.GetServers()
preetapan marked this conversation as resolved.
Show resolved Hide resolved
require.Equal(before, after)

// Send a shuffled list, verify original order doesn't change
require.False(m.SetServers([]*servers.Server{s2, s1}))
afterShuffledInput := m.GetServers()
require.Equal(after, afterShuffledInput)
}

func TestServers_FindServer(t *testing.T) {
Expand Down
3 changes: 3 additions & 0 deletions dev/cluster/client1.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ data_dir = "/tmp/client1"
# Give the agent a unique name. Defaults to hostname
name = "client1"

# Enable debugging
enable_debug = true

# Enable the client
client {
enabled = true
Expand Down
3 changes: 3 additions & 0 deletions dev/cluster/client2.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ data_dir = "/tmp/client2"
# Give the agent a unique name. Defaults to hostname
name = "client2"

# Enable debugging
enable_debug = true
preetapan marked this conversation as resolved.
Show resolved Hide resolved

# Enable the client
client {
enabled = true
Expand Down
3 changes: 3 additions & 0 deletions dev/cluster/client3.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ data_dir = "/tmp/client3"
# Give the agent a unique name. Defaults to hostname
name = "client3"

# Enable debugging
enable_debug = true

# Enable the client
client {
enabled = true
Expand Down