Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation: downgrade clarification #10461

Closed
hexfusion opened this issue Feb 9, 2019 · 4 comments
Closed

Documentation: downgrade clarification #10461

hexfusion opened this issue Feb 9, 2019 · 4 comments

Comments

@hexfusion
Copy link
Contributor

If a cluster is upgraded to 3.3 from 3.2 the documentation states that downgrade is not possible. [1] This holds true with regard to the binary version of etcd members. For example, if a cluster is upgraded to 3.3 and all members agree on 3.3 api, version is then appended to the state file. The downgrade of etcd members to a minor version lower then 3.3 (ie 3.2)will then result in panic.

But in the case of a restore operation of a 3.3 cluster snapshot to 3.2 no panic occurs.

Questions:

1.) Is restore from 3.3 to 3.2 with 3.3 snapshot a valid downgrade path?
2.) If not although no panic occurs it seems like differnt versions of boltdb etc could cause unknown problems. Should this restore cause panic in this case?

[1] Documentation/upgrades/upgrade_3_3.md#downgrade

/cc @gyuho @xiang90 @jpbetz @wenjiaswe

@hexfusion
Copy link
Contributor Author

More information on this after a bit of research. The member will hit the below and exit with 1 if the agreed upon cluster version is greater than the etcd binary version of the member itself. So as I noted above in a 3.3 cluster a 3.2 member will fail.

plog.Fatalf("cluster cannot be downgraded (current version: %s is lower than determined cluster version: %s).", version.Version, version.Cluster(cv.String()))

In 3.x the underlying data format has not changed. So in the same manner that we reset the member bucket in a restore operation we also can reset the cluster version.

The code is intended to stop etcd members to enter the cluster which could not achieve the minimum API version. I am going to be taking on documenting this process as well as looking into implementing/collaborating with Google team on @wenjiaswe graceful downgrade RFC.

ref: #7308

@jpbetz
Copy link
Contributor

jpbetz commented Feb 16, 2019 via email

@wenjiaswe
Copy link
Contributor

@hexfusion thanks for digging into this and sorry for the delay. Yes you are right, once the cluster version is already at the higher minor version, then adding lower version member is not acceptable and that's exactly what we could solve in downgrade support. And the first step we want to do is to temporarily whitelist the one minor version during downgrade: #9306 (comment). As we discussed offline, thank you so much for offering starting the POC and let's work together to solve this:)

@stale
Copy link

stale bot commented Apr 7, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 7, 2020
@stale stale bot closed this as completed Apr 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants