Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: mod/lock #347

Merged
merged 11 commits into from
Dec 5, 2013
Merged

WIP: mod/lock #347

merged 11 commits into from
Dec 5, 2013

Conversation

benbjohnson
Copy link
Contributor

Overview

This pull request implements a locking module for etcd. The goal is to provide a way for server nodes to be able to easily lock resources while they are in use and then release the lock when it's done. This locking mechanism is similar, in premise, to the distributed lock system that Zookeeper.

Because etcd does not have ephemeral nodes, a simpler TTL-based approach was used to provide a fallback for failed nodes. This approach has the benefit that it can be used in languages that don't easily support multithreading such as BASH scripts.

Adding ephemeral nodes in the future can provide a quicker release of locks acquired by machines that fail completely. However, a TTL should still be used since a failed machine could simply hang and maintain the connection to etcd.

Usage

Acquiring a lock

To obtain a lock, simply send a POST request to /mod/lock/<key>:

$ curl -X POST http://localhost:4001/mod/lock/myresource?ttl=60
2

This returns the index of the lock: 2. You can use this index to renew or release the lock later.

If another process already has a lock on /myresource, then the request will hang until the other process releases the lock or the TTL causes the lock to expire.

Renewing a lock

If you have a long, multi-step process you can renew the lock that you originally obtained by using a PUT and the lock index:

$ curl -X PUT http://localhost:4001/mod/lock/myresource/2?ttl=60

This will renew lock index 2 for another 60 seconds if the index still owns the lock. If the lock has been released for some reason, then an error will be returned.

Releasing a lock

When you're done with your lock, you can release it by using a DELETE request with the index:

$ curl -X DELETE http://localhost:4001/mod/lock/myresource/2

Checking for lock existence

You can check to see the current index that has the lock by using a GET:

$ curl http://localhost:4001/mod/lock/myresource

However, this is mainly for debugging. Locks can be lost in the time between request and response of the GET call.

Notes

I fixed a bug in the functional tests which caused a bunch of failed assertions. I don't think it's anything serious but I wanted to get this PR in front of you guys so you can review the approach.

Don't check in this PR yet. :)

/cc: @xiangli-cmu @philips

@derekchiang
Copy link
Contributor

Ouch I have totally being working on a lock module too.... told Brandon but did not notify etcd-dev :(

@philips
Copy link
Contributor

philips commented Nov 30, 2013

@derekchiang Oh dang, I thought you were still working on the set module :( On the upside this is a really important bit so having two people who have attempted to implement it means we can get it right.

@derekchiang
Copy link
Contributor

@philips I realized I needed a lock module as soon as I started working on the set module, so I started writing a lock module instead. Sorry for not communicating this clearly with the team!

My implementation is similar to Ben's, except that mine doesn't allow for renewing a lock, and that instead of using go-etcd to communicate with the server, my implementation directly uses mux.ServeHTTP().

So what should I do now?

@philips
Copy link
Contributor

philips commented Dec 1, 2013

@benbjohnson I like the approach alot! I think we need to tweak the API slightly to be resilient against long wait times and HTTP connections dying though:

curl -X POST http://localhost:4001/mod/lock/myresource?ttl=60&wait=false
2

As it is now there isn't a way to refresh your place in line, delete a lock that you no longer want or recover from a severed HTTP connection if the client or master dies.

Maybe we should always stream back the body but keep the connection open on wait=true too.

Overall it works and looks great on the handful of tests I tried out.

@philips
Copy link
Contributor

philips commented Dec 1, 2013

Oh, and we should version this API just like the rest. Start at v1

@benbjohnson
Copy link
Contributor Author

@philips Sounds good. How do these changes sound:

  1. Add a timeout query parameter. A timeout=0 is the same as wait=false, no timeout means it waits indefinitely, and timeout value greater than zero means to wait for that many seconds before timing out.
  2. Check connection status before acquiring lock.
  3. Add clean up for severed connections.

You can delete a lock using the DELETE if you've already acquired the lock. What do you mean by "stream back the body but keep the connection open"?

@philips
Copy link
Contributor

philips commented Dec 2, 2013

@benbjohnson

  1. Timeout sounds better +1
  2. Yes, making sure the client is still there is important. That would fix the only buglet I saw.
  3. The cleanup for a severed connection will be deletion of the client's key?

"Re: stream back the body but keep the connection open": Right now the API creates you an id that you don't get until you successfully get the lock. I can see how it would be useful to get your id back immediately in languages where keeping an HTTP connection open is expensive or the critical section is really long (minutes). So you stream back the ticket and give the client the ability to refresh it within the TTL window. We can add this feature later however but something to consider.

@philips
Copy link
Contributor

philips commented Dec 4, 2013

@benbjohnson Xiang's response refactor got merged so we can probably clean this up and merge it.

@benbjohnson
Copy link
Contributor Author

@philips I'm still adding the fixes to clean up broken connections. Then I'll merge in @xiangli-cmu's changes and then merge this branch.

@philips
Copy link
Contributor

philips commented Dec 4, 2013

@benbjohnson Cool, sounds good. Thanks.

Conflicts:
	server/v2/tests/delete_handler_test.go
	server/v2/tests/get_handler_test.go
	server/v2/tests/post_handler_test.go
	server/v2/tests/put_handler_test.go
	third_party/github.com/coreos/go-etcd/etcd/requests.go
@benbjohnson
Copy link
Contributor Author

@philips Ok, I got the connection monitoring working well, I versioned the lock to point at /mod/v2/lock, and I merged @xiangli-cmu's changes and fixed the tests. It should be good to go. Merge at will.

philips added a commit that referenced this pull request Dec 5, 2013
@philips philips merged commit af20be8 into etcd-io:master Dec 5, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants