Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable ETCD Learner Mode #7669

Merged

Conversation

jonathanmeier5
Copy link
Member

@jonathanmeier5 jonathanmeier5 commented Feb 21, 2024

ETCD Learner Mode went to Beta in k8s 1.29 and is now default enabled (see here, here).

When a new stacked etcd instance comes up, it joins the cluster in learner mode.

The API Server cannot perform rpc calls against its etcd instance, and fails to come online.

In theory the etcd instance should be promoted to a full member as done in kubeadm here.

This is not happening because our bottlerocket bootstrap container is not allowing etcd phase to complete.

We are disabling this feature for 0.19 and will likely enable it for 0.20 when we also incorporate new CAPI work that allows feature gates to be mutable.

The feature gate will likely go GA in upstream 1.31.

Issue #, if available:
This was causing failures on *129StackedEtcdUpgrade end to end tests.

Testing (if applicable):
Ran TestVSphereKubernetes128BottlerocketTo129StackedEtcdUpgrade with a custom built controller image that set this EtcdLearnerEnabled feature flag to false.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@eks-distro-bot
Copy link
Collaborator

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@eks-distro-bot eks-distro-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 21, 2024
@jonathanmeier5 jonathanmeier5 force-pushed the feature/disable-etcd-learner-mode branch from 7b59ee5 to e914cd1 Compare February 26, 2024 22:27
@eks-distro-bot eks-distro-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 26, 2024
@jonathanmeier5 jonathanmeier5 force-pushed the feature/disable-etcd-learner-mode branch from e914cd1 to 9ebb28f Compare February 26, 2024 22:28
@eks-distro-bot eks-distro-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/invalid-commit-message size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 26, 2024
Copy link

codecov bot commented Feb 26, 2024

Codecov Report

Attention: Patch coverage is 53.84615% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 73.60%. Comparing base (4583834) to head (f44c9e8).
Report is 139 commits behind head on main.

Files Patch % Lines
pkg/providers/snow/apibuilder.go 33.33% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7669      +/-   ##
==========================================
+ Coverage   73.48%   73.60%   +0.12%     
==========================================
  Files         579      588       +9     
  Lines       36357    37157     +800     
==========================================
+ Hits        26718    27351     +633     
- Misses       7875     8014     +139     
- Partials     1764     1792      +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jonathanmeier5 jonathanmeier5 force-pushed the feature/disable-etcd-learner-mode branch 2 times, most recently from 239f69d to d10e068 Compare February 26, 2024 22:37
@jonathanmeier5 jonathanmeier5 marked this pull request as ready for review February 26, 2024 22:43
@jonathanmeier5 jonathanmeier5 force-pushed the feature/disable-etcd-learner-mode branch from d10e068 to e4ab7ba Compare February 26, 2024 22:51
ETCD Learner Mode went to Beta in k8s 1.29 and is now default enabled.

When a new stacked etcd instance comes up, it joins the cluster in learner mode.

The API Server cannot perform rpc calls against its etcd instance, and fails to come online.

In theory the etcd instance should be promoted to a full member and this issue should
be resolved, but for some reason this is not happening.

While we investigate a root cause, we are disabling this new feature flag explicitly.
@jonathanmeier5 jonathanmeier5 force-pushed the feature/disable-etcd-learner-mode branch from e4ab7ba to 87f158b Compare February 26, 2024 23:49
@eks-distro-bot eks-distro-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 26, 2024
@abhay-krishna abhay-krishna force-pushed the feature/disable-etcd-learner-mode branch from fdceb1e to f44c9e8 Compare February 27, 2024 00:14
@abhay-krishna
Copy link
Member

/cherrypick release-0.19

@eks-distro-pr-bot
Copy link
Contributor

@abhay-krishna: once the present PR merges, I will cherry-pick it on top of release-0.19 in a new PR and assign it to you.

In response to this:

/cherrypick release-0.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@abhay-krishna
Copy link
Member

/lgtm

@abhay-krishna
Copy link
Member

/approve

@eks-distro-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhay-krishna

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abhay-krishna abhay-krishna merged commit 9a26282 into aws:main Feb 27, 2024
10 of 12 checks passed
@eks-distro-pr-bot
Copy link
Contributor

@abhay-krishna: new pull request created: #7719

In response to this:

/cherrypick release-0.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/docs Documentation documentation lgtm size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants