WIP: mod/lock #347

benbjohnson · 2013-11-29T23:58:28Z

Overview

This pull request implements a locking module for etcd. The goal is to provide a way for server nodes to be able to easily lock resources while they are in use and then release the lock when it's done. This locking mechanism is similar, in premise, to the distributed lock system that Zookeeper.

Because etcd does not have ephemeral nodes, a simpler TTL-based approach was used to provide a fallback for failed nodes. This approach has the benefit that it can be used in languages that don't easily support multithreading such as BASH scripts.

Adding ephemeral nodes in the future can provide a quicker release of locks acquired by machines that fail completely. However, a TTL should still be used since a failed machine could simply hang and maintain the connection to etcd.

Usage

Acquiring a lock

To obtain a lock, simply send a POST request to /mod/lock/<key>:

$ curl -X POST http://localhost:4001/mod/lock/myresource?ttl=60
2

This returns the index of the lock: 2. You can use this index to renew or release the lock later.

If another process already has a lock on /myresource, then the request will hang until the other process releases the lock or the TTL causes the lock to expire.

Renewing a lock

If you have a long, multi-step process you can renew the lock that you originally obtained by using a PUT and the lock index:

$ curl -X PUT http://localhost:4001/mod/lock/myresource/2?ttl=60

This will renew lock index 2 for another 60 seconds if the index still owns the lock. If the lock has been released for some reason, then an error will be returned.

Releasing a lock

When you're done with your lock, you can release it by using a DELETE request with the index:

$ curl -X DELETE http://localhost:4001/mod/lock/myresource/2

Checking for lock existence

You can check to see the current index that has the lock by using a GET:

$ curl http://localhost:4001/mod/lock/myresource

However, this is mainly for debugging. Locks can be lost in the time between request and response of the GET call.

Notes

I fixed a bug in the functional tests which caused a bunch of failed assertions. I don't think it's anything serious but I wanted to get this PR in front of you guys so you can review the approach.

Don't check in this PR yet. :)

/cc: @xiangli-cmu @philips

Conflicts: mod/mod.go

derekchiang · 2013-11-30T01:35:15Z

Ouch I have totally being working on a lock module too.... told Brandon but did not notify etcd-dev :(

philips · 2013-11-30T01:37:28Z

@derekchiang Oh dang, I thought you were still working on the set module :( On the upside this is a really important bit so having two people who have attempted to implement it means we can get it right.

derekchiang · 2013-11-30T01:48:00Z

@philips I realized I needed a lock module as soon as I started working on the set module, so I started writing a lock module instead. Sorry for not communicating this clearly with the team!

My implementation is similar to Ben's, except that mine doesn't allow for renewing a lock, and that instead of using go-etcd to communicate with the server, my implementation directly uses mux.ServeHTTP().

So what should I do now?

philips · 2013-12-01T06:44:14Z

@benbjohnson I like the approach alot! I think we need to tweak the API slightly to be resilient against long wait times and HTTP connections dying though:

curl -X POST http://localhost:4001/mod/lock/myresource?ttl=60&wait=false
2

As it is now there isn't a way to refresh your place in line, delete a lock that you no longer want or recover from a severed HTTP connection if the client or master dies.

Maybe we should always stream back the body but keep the connection open on wait=true too.

Overall it works and looks great on the handful of tests I tried out.

philips · 2013-12-01T21:48:52Z

Oh, and we should version this API just like the rest. Start at v1

benbjohnson · 2013-12-02T15:58:32Z

@philips Sounds good. How do these changes sound:

Add a timeout query parameter. A timeout=0 is the same as wait=false, no timeout means it waits indefinitely, and timeout value greater than zero means to wait for that many seconds before timing out.
Check connection status before acquiring lock.
Add clean up for severed connections.

You can delete a lock using the DELETE if you've already acquired the lock. What do you mean by "stream back the body but keep the connection open"?

philips · 2013-12-02T17:52:16Z

@benbjohnson

Timeout sounds better +1
Yes, making sure the client is still there is important. That would fix the only buglet I saw.
The cleanup for a severed connection will be deletion of the client's key?

"Re: stream back the body but keep the connection open": Right now the API creates you an id that you don't get until you successfully get the lock. I can see how it would be useful to get your id back immediately in languages where keeping an HTTP connection open is expensive or the critical section is really long (minutes). So you stream back the ticket and give the client the ability to refresh it within the TTL window. We can add this feature later however but something to consider.

philips · 2013-12-04T00:51:58Z

@benbjohnson Xiang's response refactor got merged so we can probably clean this up and merge it.

benbjohnson · 2013-12-04T05:23:45Z

@philips I'm still adding the fixes to clean up broken connections. Then I'll merge in @xiangli-cmu's changes and then merge this branch.

philips · 2013-12-04T05:33:56Z

@benbjohnson Cool, sounds good. Thanks.

Conflicts: server/v2/tests/delete_handler_test.go server/v2/tests/get_handler_test.go server/v2/tests/post_handler_test.go server/v2/tests/put_handler_test.go third_party/github.com/coreos/go-etcd/etcd/requests.go

benbjohnson · 2013-12-05T05:40:07Z

@philips Ok, I got the connection monitoring working well, I versioned the lock to point at /mod/v2/lock, and I merged @xiangli-cmu's changes and fixed the tests. It should be good to go. Merge at will.

WIP: mod/lock

benbjohnson added 6 commits November 19, 2013 16:12

mod_lock

228754a

mod/lock

3286124

Merge branch 'master' of https://github.com/coreos/etcd into mod-lock

74ad50e

Conflicts: mod/mod.go

Initial mod_lock acquire.

22c2935

Lock testing.

d8e9838

Merge branch 'master' of https://github.com/coreos/etcd into mod-lock

03571d3

Fix test harness.

df20be7

philips mentioned this pull request Dec 4, 2013

Rough implementation of an event-based job control model coreos/fleet#28

Merged

benbjohnson added 4 commits December 4, 2013 16:23

Add mod/lock connection monitoring.

f3d438a

Add mod/lock version.

e76b7d1

Update mod/lock versioning.

b784ced

philips added a commit that referenced this pull request Dec 5, 2013

Merge pull request #347 from benbjohnson/mod-lock

af20be8

WIP: mod/lock

philips merged commit af20be8 into etcd-io:master Dec 5, 2013

stianeikeland mentioned this pull request Dec 13, 2013

Implement locking for v2 api stianeikeland/node-etcd#11

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: mod/lock #347

WIP: mod/lock #347

benbjohnson commented Nov 29, 2013

derekchiang commented Nov 30, 2013

philips commented Nov 30, 2013

derekchiang commented Nov 30, 2013

philips commented Dec 1, 2013

philips commented Dec 1, 2013

benbjohnson commented Dec 2, 2013

philips commented Dec 2, 2013

philips commented Dec 4, 2013

benbjohnson commented Dec 4, 2013

philips commented Dec 4, 2013

benbjohnson commented Dec 5, 2013

WIP: mod/lock #347

WIP: mod/lock #347

Conversation

benbjohnson commented Nov 29, 2013

Overview

Usage

Acquiring a lock

Renewing a lock

Releasing a lock

Checking for lock existence

Notes

derekchiang commented Nov 30, 2013

philips commented Nov 30, 2013

derekchiang commented Nov 30, 2013

philips commented Dec 1, 2013

philips commented Dec 1, 2013

benbjohnson commented Dec 2, 2013

philips commented Dec 2, 2013

philips commented Dec 4, 2013

benbjohnson commented Dec 4, 2013

philips commented Dec 4, 2013

benbjohnson commented Dec 5, 2013