-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd v3 backend with lock support. #15680
Conversation
cb3fbfd
to
19b61d3
Compare
Hi @brunomcustodio, Thanks for putting this together. One question I have to start, since I'm not up to date on the current state of etcd, can we "update" the existing etcd remote state or is there something fundamentally different about etcdv3? |
@jbardin thank you very much for taking the time :-) I'm looking forward to it. Yes, there are some fundamental differences between |
a3da2b0
to
2a2dc89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great start! I made a number of comments inline, some of which may not apply to etcd.
s := &schema.Backend{ | ||
Schema: map[string]*schema.Schema{ | ||
"endpoints": &schema.Schema{ | ||
Type: schema.TypeString, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're now using a schema, it might be nice to make this a TypeList, rather than relying on splitting a string. We would still need to split the string from an env variable, but the config would be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, many thanks 👍 70aad79
for _, kv := range res.Kvs { | ||
result = append(result, strings.TrimPrefix(string(kv.Key), prefix)) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably a good idea to sort.String(result[1:])
to make sure we get deterministic results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure 👍 b896348
return stateMgr, nil | ||
} | ||
|
||
func (b *Backend) determineKey(name string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the changes in etcd means that there isn't going to be a direct upgrade from etcd2, you can probably get rid of the legacy default state path, and assume everything has a "workspace" and get rid of the conditionals around keyEnvPrefix
and such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(...) you can probably get rid of the legacy default state path (...) and get rid of the conditionals around
keyEnvPrefix
and such.
@jbardin I'm not sure I understand what you mean. 😶 Could you please clarify/exemplify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that the reason the other backends have the "default" state handled separately is solely for backwards compatibility with state files that existed before envs/workspaces.
So rather than dealing with the keyEnvPrefix
you've adopted from the others, you could use the same hierarchy in all cases of prefix/name
, and just include "default" in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done @ 3c21b9c. Thank you for your guidance. 🙂
const ( | ||
lockAcquireTimeout = 2 * time.Second | ||
lockInfoSuffix = ".lockinfo" | ||
lockSuffix = ".lock" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lockSuffix
was strictly a consul implementation detail, so that the terraform lock works just like the consul cli locks. This may be different for etcd, or maybe it doesn't matter at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 🙂 Removed in 038f5eb.
} | ||
|
||
func (c *RemoteClient) lock() (string, error) { | ||
session, err := etcdv3sync.NewSession(c.Client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what etcd's behavior would be here, but terraform should never recursively lock, so you can check for an existing session and return an error if there is already a lock outstanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍 bb4dec6
} | ||
|
||
c.etcdMutex = mutex | ||
c.etcdSession = session |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I'm not sure about etcd, but consul sessions favored liveness, and were easier to lose than desired and susceptible to network timeouts. I'm not sure if we need to do the same here, but the consul backend watches the session status and reconnects when necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that's not the case withetcd
, judging by a number of factors including this
https://github.com/coreos/etcd/blob/master/clientv3/concurrency/session.go#L26
https://github.com/coreos/etcd/blob/master/clientv3/concurrency/session.go#L64
and some tests I ran locally using etcdctl
. For example, try running
$ ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:12379 lock my-terraform-lock -- watch date
This will sit on a loop printing the date every two seconds. If you start a second command against a different etcd
member of the same cluster you will (obviously) be left waiting to acquire the lock:
$ ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:22379 lock my-terraform-lock -- watch date
Now, if you take down 127.0.0.1:12379
(the first member), the second command will still be left waiting to acquire the lock. If you take 127.0.0.1:12379
up again the first process resumes printing the date. It is only when the first process is stopped that the second one takes over the lock and starts printing the date. Here's the code for the lock
CLI command, which of course uses the same API as this implementation:
https://github.com/coreos/etcd/blob/master/etcdctl/ctlv3/command/lock_command.go
In the end I'd say we don't need to keep our own goroutine. WDYT @jbardin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Let's favor the simpler solution for now under the assumption that the etc client package takes care of all the details.
if err := c.etcdSession.Close(); err != nil { | ||
errs = multierror.Append(errs, err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may want to remove the lockInfo too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, of course 🙂 b8f4f6d
Hi @brunomcustodio, Thais looks very good after a preliminary pass through the code. Did you have any specific questions about the locking semantics? |
2a2dc89
to
af2f726
Compare
af2f726
to
52c97e9
Compare
9aa417e
to
bb4dec6
Compare
f60075f
to
038f5eb
Compare
5f23328
to
8f7b315
Compare
0ee7895
to
4d23195
Compare
@jbardin I believe this to be at a good point for another review. I've addressed your suggestions and in the meantime added TLS support (for secure client connections) and documentation. I've added a comment to
I see the |
@jbardin have you had the time to look into this? Do you consider merging it in the short-term? |
I've ran this manually and it seemed to work smoothly. Great job, people! |
( Sorry for the closing/reopening, caused by a slight mistake on my part. ) |
Hi @brunomcustodio, Thanks for hanging in there while we've been so busy! This is looking great! Since backend tests still need to be run manually, we usually paste the test output into the PR comments, to record the acceptance tests as passing. |
@jbardin once again thank you very much for taking the time to review this. Here's the concise version:
I pasted the full version (a second run) in this Gist in case you want to take a look. |
Thanks for all the work @brunomcustodio! Let's drop this into master so people can start trying it out! |
Most welcome @jbardin 🙂 Thanks for taking the time! |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
As a newcomer to Terraform, and having a few
etcd
clusters under my supervision, I thought it would be a nice idea to add anetcdv3
backend with lock support. This is my first attempt at writing one, done mainly as an exercise and by replicating the existingconsul
backend and making whatever changes are required.However, I need a little guidance in what concerns locking, because the lock APIs of
consul
andetcd
are somewhat different and give different guarantees. Besides, I'm not really familiar with Consul.Could anyone please be so kind as to have a look at the attached code and point out what can/must be changed in the
Lock
/Unlock
methods? What is the expected "behaviour" of each method?