Skip to content

Commit

Permalink
[Fix] Services sometimes not being synced with acl_enforce_version_8 …
Browse files Browse the repository at this point in the history
…= false (#4771)

Fixes: #3676

This fixes a bug were registering an agent with a non-existent ACL token can prevent other 
services registered with a good token from being synced to the server when using 
`acl_enforce_version_8 = false`.

## Background

When `acl_enforce_version_8` is off the agent does not check the ACL token validity before 
storing the service in its state.
When syncing a service registered with a missing ACL token we fall into the default error 
handling case (https://github.com/hashicorp/consul/blob/master/agent/local/state.go#L1255)
and stop the sync (https://github.com/hashicorp/consul/blob/master/agent/local/state.go#L1082)
without setting its Synced property to true like in the permission denied case.
This means that the sync will always stop at the faulty service(s).
The order in which the services are synced is random since we iterate on a map. So eventually
all services with good ACL tokens will be synced, this can however take some time and is influenced 
by the cluster size, the bigger the slower because retries are less frequent.
Having a service in this state also prevent all further sync of checks as they are done after
the services.

## Changes 

This change modify the sync process to continue even if there is an error. 
This fixes the issue described above as well as making the sync more error tolerant: if the server repeatedly refuses
a service (the ACL token could have been deleted by the time the service is synced, the servers 
were upgraded to a newer version that has more strict checks on the service definition...). 
Then all services and check that can be synced will, and those that don't will be marked as errors in 
the logs instead of blocking the whole process.
  • Loading branch information
Aestek authored and mkeeler committed Jan 4, 2019
1 parent fde5d75 commit 5960974
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions agent/local/state.go
Original file line number Diff line number Diff line change
Expand Up @@ -1210,7 +1210,7 @@ func (l *State) deleteService(id string) error {
l.logger.Printf("[INFO] agent: Deregistered service %q", id)
return nil

case acl.IsErrPermissionDenied(err):
case acl.IsErrPermissionDenied(err), acl.IsErrNotFound(err):
// todo(fs): mark the service to be in sync to prevent excessive retrying before next full sync
// todo(fs): some backoff strategy might be a better solution
l.services[id].InSync = true
Expand Down Expand Up @@ -1248,7 +1248,7 @@ func (l *State) deleteCheck(id types.CheckID) error {
l.logger.Printf("[INFO] agent: Deregistered check %q", id)
return nil

case acl.IsErrPermissionDenied(err):
case acl.IsErrPermissionDenied(err), acl.IsErrNotFound(err):
// todo(fs): mark the check to be in sync to prevent excessive retrying before next full sync
// todo(fs): some backoff strategy might be a better solution
l.checks[id].InSync = true
Expand Down Expand Up @@ -1316,7 +1316,7 @@ func (l *State) syncService(id string) error {
l.logger.Printf("[INFO] agent: Synced service %q", id)
return nil

case acl.IsErrPermissionDenied(err):
case acl.IsErrPermissionDenied(err), acl.IsErrNotFound(err):
// todo(fs): mark the service and the checks to be in sync to prevent excessive retrying before next full sync
// todo(fs): some backoff strategy might be a better solution
l.services[id].InSync = true
Expand Down Expand Up @@ -1365,7 +1365,7 @@ func (l *State) syncCheck(id types.CheckID) error {
l.logger.Printf("[INFO] agent: Synced check %q", id)
return nil

case acl.IsErrPermissionDenied(err):
case acl.IsErrPermissionDenied(err), acl.IsErrNotFound(err):
// todo(fs): mark the check to be in sync to prevent excessive retrying before next full sync
// todo(fs): some backoff strategy might be a better solution
l.checks[id].InSync = true
Expand Down Expand Up @@ -1397,7 +1397,7 @@ func (l *State) syncNodeInfo() error {
l.logger.Printf("[INFO] agent: Synced node info")
return nil

case acl.IsErrPermissionDenied(err):
case acl.IsErrPermissionDenied(err), acl.IsErrNotFound(err):
// todo(fs): mark the node info to be in sync to prevent excessive retrying before next full sync
// todo(fs): some backoff strategy might be a better solution
l.nodeInfoInSync = true
Expand Down

0 comments on commit 5960974

Please sign in to comment.