-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: add "--unsafe-allow-cluster-version-downgrade" for not failing cluster version downgrade #13022
Conversation
@jpbetz Do we have any idea how widely lease checkpointer is used? The main motivation here is, while downgrade API helps you work around allowing version checks, it still does not prevent crashes from new message field like lease checkpointer. Then we might as well allow built-in downgrade, for whomever is willing to take the risk -- the risk can be deterministic, calculated if one controls the client behavior. |
Codecov Report
@@ Coverage Diff @@
## main #13022 +/- ##
===========================================
+ Coverage 47.24% 66.99% +19.74%
===========================================
Files 438 420 -18
Lines 34110 33409 -701
===========================================
+ Hits 16116 22382 +6266
+ Misses 16088 9047 -7041
- Partials 1906 1980 +74
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
78f9d2b
to
9711b4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, but I was wondering could also work together with the downgrade subcommand, should we then maybe put it under that as a flag?
etcdctl downgrade --unsafe
Note the downgrade command work still needs to finished up. I just wonder if that is a somewhat better UX?
server/embed/config.go
Outdated
@@ -489,7 +499,8 @@ func NewConfig() *Config { | |||
ExperimentalMemoryMlock: false, | |||
ExperimentalTxnModeWriteWithSharedBuffer: true, | |||
|
|||
V2Deprecation: config.V2_DEPR_DEFAULT, | |||
UnsafeAllowClusterVersionDowngrade: DefaultUnsafeAllowClusterVersionDowngrade, | |||
V2Deprecation: config.V2_DEPR_DEFAULT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick and optional: lets add a new line as these two are not associated, as per ordering of the rest of the fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, fixed.
A big reason for adding a flag is the downgrade api is not completed in client side and we need to integrate this to the operator downgrade workflow. So this PR provides a simpler way for minor version downgrade automation. I can help finish the remaining etcdctl downgrade work if the original author is okay with that == |
9711b4f
to
c8d2dad
Compare
c8d2dad
to
3d4499a
Compare
1c4027b
to
94f9936
Compare
I added downgrade API to the client, but the part of the command still need to be worked on, I started the discussion on original issue #11716. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Compared to the public etcd downgrade design, I am trying to simplify the manual "whitelisting target downgrade version" process and it doesn't require any server/client side API changes. Once the new
--unsafe-allow-cluster-version-downgrade
flag is enabled, only one minor version downgrade is allowed (e.g. v3.6 to v3.5). We can cherry-pick the same change to v3.4 and v3.3 if this feature is useful.However, if a user is already using
--experimental-enable-lease-checkpoint
flag to bootstrap a v3.4 cluster, the version downgrade is not possible due to the MustUnmarshal will panic onLeaseCheckpoint
internal raft message which doesn't exist in v3.3. Not to mention there is no corresponding apply method. As a result, the new added etcd v3.3 flag--unsafe-allow-cluster-version-downgrade
is not expected to set totrue
in this case.I've excessively tested v3.5 to v3.4 and v3.4 to v3.3 downgrade in two environments and the results look good. I will publish the backport PRs once the community agree on the approach.
I understand this is a risky operation without controlling the set of APIs used by the etcd client. But, I would like to put it here for reference, as others may find it useful.
Any feedback is welcome, Thanks!