Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix etcd3 locks to handle improper shutdown #2526

Merged

Conversation

marshallbrekka
Copy link
Contributor

Problem: Etcd3 locks are not released if shutdown improperly.

Example

After an instance has improperly shutdown, its lock will remain in Etcd

$ ETCDCTL_API=3 /tmp/test-etcd/etcdctl  get /vault/core/lock/ --keys-only=true --prefix=true

/vault/core/lock/49075afbd30ce3c9

/vault/core/lock/7e055af811b4bb53

/vault/core/lock/7e055afbd3165b61

Inspecting them individually shows that two (the alive instances) pending locks have leases (results trimmed for brevity).

$ ETCDCTL_API=3 /tmp/test-etcd/etcdctl get /vault/core/lock/49075afbd30ce3c9 --write-out json

{  
   "kvs":[  
      {  
         "key":"L3ZhdWx0L2NvcmUvbG9jay80OTA3NWFmYmQzMGNlM2M5",
         "lease":5262274727229842377
      }
   ]
}

$ ETCDCTL_API=3 /tmp/test-etcd/etcdctl get /vault/core/lock/7e055afbd3165b61 --write-out json

{  
   "kvs":[  
      {  
         "key":"L3ZhdWx0L2NvcmUvbG9jay83ZTA1NWFmYmQzMTY1YjYx",
         "lease":9080764261287222113
      }
   ]
}

While the third (the dead instance) owns the lock, but does not have a lease applied.
$ ETCDCTL_API=3 /tmp/test-etcd/etcdctl get /vault/core/lock/7e055af811b4bb53 --write-out json

{  
   "kvs":[  
      {  
         "key":"L3ZhdWx0L2NvcmUvbG9jay83ZTA1NWFmODExYjRiYjUz",
         "value":"MDgxZWNlZDctOThiMC01ZmRmLTZlODMtMzE4NTNiZjdkYzI5"
      }
   ]
}

Solution: Write the lock item with the lease.

Looking at the internals of the Mutex.Lock method shows that all that is required is to add the lease with the put operation

@jefferai
Copy link
Member

Pinging @xiang90 for a review

@xiang90
Copy link
Contributor

xiang90 commented Mar 27, 2017

@jefferai This pr changes the default lock timeout from 60s to 15s. Is this consistent with other backends? If yes, then I am ok with it.

@@ -32,6 +32,9 @@ type EtcdBackend struct {
etcd *clientv3.Client
}

// Etcd default for lease is 60s, set to 15s for faster recovery
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etcd default lease duration is 60s. set to 15s for faster recovery.

@@ -32,6 +32,9 @@ type EtcdBackend struct {
etcd *clientv3.Client
}

// Etcd default for lease is 60s, set to 15s for faster recovery
const etcd3LockTTLSeconds = 15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etcd3LockTimeoutInSeconds

@xiang90
Copy link
Contributor

xiang90 commented Mar 27, 2017

LGTM after addressing the minor issues.

Solution: Write the lock item with the lease.
@marshallbrekka marshallbrekka force-pushed the etcd3-lock-release-on-bad-shutdown branch from 81a3834 to 9e705b2 Compare March 27, 2017 19:18
@jefferai
Copy link
Member

@xiang90 Making it shorter makes sense to me if it's generally stable.

@xiang90
Copy link
Contributor

xiang90 commented Mar 27, 2017

@jefferai OK. Then 15 seconds should be good enough.

@jefferai
Copy link
Member

Seems this is ready to merge?

@xiang90
Copy link
Contributor

xiang90 commented Mar 28, 2017

LGTM. I cannot click the button. :P

@jefferai
Copy link
Member

@xiang90 Hah, just wanted to make sure that you were fine with the changes. Probably should have just requested a review :-) Thanks!

@jefferai jefferai merged commit f06c69e into hashicorp:master Mar 28, 2017
@marshallbrekka marshallbrekka deleted the etcd3-lock-release-on-bad-shutdown branch March 28, 2017 15:09
@raoofm raoofm mentioned this pull request May 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants