Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: fix LeaseRevoke may fail to apply when authentication is enabled and upgrading cluster from etcd-3.2 to etcd-3.3 #11691

Merged

Conversation

wswcfan
Copy link
Contributor

@wswcfan wswcfan commented Mar 11, 2020

fix issue: #11689

@wswcfan wswcfan force-pushed the fix-etcd-3.2-to-3.3-upgrade-bug branch from 9511a84 to 688c0a5 Compare March 11, 2020 16:16
@wswcfan wswcfan changed the title etcdserver: fix LeaseRevoke may fail to apply when authentication is enabled and upgrading cluster from etcd-3.2 go etcd-3.3 etcdserver: fix LeaseRevoke may fail to apply when authentication is enabled and upgrading cluster from etcd-3.2 to etcd-3.3 Mar 11, 2020
@wswcfan wswcfan force-pushed the fix-etcd-3.2-to-3.3-upgrade-bug branch from 688c0a5 to cde7af7 Compare March 11, 2020 18:31

var ctxForAssign context.Context
if ts, ok := as.tokenProvider.(*tokenSimple); ok && ts != nil {
ctx1 := context.WithValue(ctx, "index", uint64(0))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I should merge this PR's change:
#8901

@wswcfan wswcfan force-pushed the fix-etcd-3.2-to-3.3-upgrade-bug branch from cde7af7 to 9cff417 Compare March 12, 2020 13:50
Copy link
Contributor

@mitake mitake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to think about how to handle JWT token and CN based auth. Anyway thanks a lot for your PR!

etcdserver/v3_server.go Show resolved Hide resolved
auth/store.go Show resolved Hide resolved
auth/store.go Show resolved Hide resolved
@tangcong
Copy link
Contributor

tangcong commented Apr 5, 2020

@mitake Could you have free time to see this problem? in the k8s scenario(lesae ttl is short), if users enable auth, this problem is encountered with high probability. If there is no good solution in the short term, can we add some tips in the upgrade document? The current document description is incorrect.

In the general case, upgrading from etcd 3.2 to 3.3 can be a zero-downtime, rolling upgrade:

one by one, stop the etcd v3.2 processes and replace them with etcd v3.3 processes
after running all v3.3 processes, new features in v3.3 are available to the cluster

@mitake
Copy link
Contributor

mitake commented Apr 6, 2020

@tangcong so sorry for my late reply... I've been busy for a while (hopefully 1 or 2 weeks). If you have a time, could you submit a PR for updating the document? The case should be described as an exception for the upgrade operation.

@tangcong
Copy link
Contributor

tangcong commented Apr 6, 2020

@mitake #11759 . thanks.

@mitake
Copy link
Contributor

mitake commented Apr 12, 2020

@tangcong thanks, you'll open a PR which includes this commit? tangcong@dd550dd

@tangcong
Copy link
Contributor

@tangcong thanks, you'll open a PR which includes this commit? tangcong@dd550dd

no, that pr only updates the upgrading document. this pr will solve the problem of how to upgrade safely.

@wswcfan
Copy link
Contributor Author

wswcfan commented May 8, 2020

@mitake, Sorry to bother you, Could you take another look about this problem when you are free?

@mitake
Copy link
Contributor

mitake commented May 8, 2020

@wswcfan @tangcong sorry for my late update. I'm thinking an idea based on a special bypassed interface only for lease revoking. I still feel attaching a fake root token is a little bit dangerous and hard to reason the behavior. I hope I can show the implementation in a few days. How do you think about the idea?

@wswcfan
Copy link
Contributor Author

wswcfan commented May 8, 2020

@mitake thanks, we look forward to your solution.

@mitake
Copy link
Contributor

mitake commented May 11, 2020

@wswcfan @tangcong On the second thought this idea is good enough for the upgrade problem, and I cannot find a time to implement another approach for a while. So let's use this. Thanks a lot for your great PR!

Copy link
Contributor

@mitake mitake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one style comment: could you rename the commit log to something like this? auth, etcdserver: fix LeaseRevoke may fail to apply when authentication is enabled and upgrading cluster from etcd-3.2 to etcd-3.3? Other than that LGTM! cc @jingyih

fix LeaseRevoke may fail to apply when authentication is enabled
and upgrading cluster from etcd-3.2 to etcd-3.3 (etcd-io#11691)
@wswcfan wswcfan force-pushed the fix-etcd-3.2-to-3.3-upgrade-bug branch from 9cff417 to 6e77b87 Compare May 11, 2020 15:58
@wswcfan
Copy link
Contributor Author

wswcfan commented May 11, 2020

@mitake I have modified the commit log, thanks for your suggestion!

Copy link
Contributor

@mitake mitake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! defer to @jingyih

@tangcong
Copy link
Contributor

@jingyih please take a look. thanks.

@jingyih
Copy link
Contributor

jingyih commented May 21, 2020

Sorry for blocking on this. I missed the notification. I will take a look today.

@jingyih
Copy link
Contributor

jingyih commented May 21, 2020

I understand currently 3.2 and 3.3 applies lease revoke request differently. I am trying to summarize the difference below. Please help me complete/correct the table.

3.2
User w/ permission: can revoke lease
User w/o permission: can revoke lease

3.2 + this PR
User w/ permission: can revoke lease
User w/o permission: can revoke lease

3.3
User w/ permission: can revoke lease
User w/o permission: cannot revoke lease

@jingyih
Copy link
Contributor

jingyih commented May 21, 2020

Oh I think I misunderstood what this PR tries to solve. This bug happens when a release revoke entry generated by 3.2 server is applied by 3.3 server. Is my understanding correct?

@wswcfan
Copy link
Contributor Author

wswcfan commented May 21, 2020

@jingyih Yes, you are right. This bug happens when a release revoke entry generated by 3.2 server is applied by 3.3 server.

This PR mainly solves the problems caused by the lease expiration, and does not completely solve the problems caused by users who do not have permission to manually invoke lease revoke during the upgrade process. In extreme cases, if the user does not have permission and sends a lease revoke request to the 3.3 node during the upgrade, the data will still be inconsistent, because the 3.2 node will apply this request but the 3.3 node will not. (I think this is acceptable. This kind of call is basically an abnormal call. We can explain this scenario in the upgrade document. Do you have any better suggestions? thanks.)

@jingyih
Copy link
Contributor

jingyih commented May 21, 2020

LGTM

(In a separate PR) Could you also help update changelog-3.2 and https://github.com/etcd-io/etcd/blob/master/Documentation/upgrades/upgrade_3_3.md#upgrade-checklists (specify which 3.2 patch version has the fix).

@jingyih jingyih merged commit 17acb61 into etcd-io:release-3.2 May 21, 2020
wswcfan added a commit to wswcfan/etcd that referenced this pull request May 22, 2020
@wswcfan
Copy link
Contributor Author

wswcfan commented May 22, 2020

@jingyih done. see #11936

wswcfan added a commit to wswcfan/etcd that referenced this pull request Jun 9, 2020
spzala added a commit that referenced this pull request Jun 9, 2020
CHANGELOG: update 3.2 changelog and 3.3 upgrade document for #11691
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants