Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubeadm feature flags should be mutable #10050

Closed
tobiasgiese opened this issue Jan 24, 2024 · 12 comments · Fixed by #10154
Closed

Kubeadm feature flags should be mutable #10050

tobiasgiese opened this issue Jan 24, 2024 · 12 comments · Fixed by #10154
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@tobiasgiese
Copy link
Member

tobiasgiese commented Jan 24, 2024

What would you like to be added (User Story)?

As an operator I would like to be able to adjust kubeadm feature gates on existing KubeadmControlPlanes.

Detailed Description

Currently we want to enable the Kubeadm v1.27 feature gate EtcdLearnerMode. As we update our existing KCPs we also want to add this new feature flag.
Unfortunately, the KCP webhooks denies the mutation of kcp.Spec.KubeadmConfigSpec.ClusterConfiguration.FeatureGates.

Anything else you would like to add?

In PR #9871 we already allowed several parameters, but the feature gates are still immutable.
IMO it's totally fair to allow it. Also it is not necessary to rollout the KCP immediately after the change.
If a rollout is needed we can trigger it manually.

Label(s) to be applied

/kind feature
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

Tobias Giese tobias.giese@mercedes-benz.com, Mercedes-Benz Tech Innovation GmbH, legal info/Impressum

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 24, 2024
@sbueringer
Copy link
Member

sbueringer commented Jan 24, 2024

Sounds good to me.

Also it is not necessary to rollout the KCP immediately after the change.

If we change the webhook to make the field mutable, what is then the current rollout behavior without further changes?

@tobiasgiese
Copy link
Member Author

tobiasgiese commented Jan 24, 2024

cc @fabriziopandini maybe you have some thoughts on this 🙂

@tobiasgiese
Copy link
Member Author

If we change the webhook to make the field mutable, what is then the current rollout behavior without further changes?

I don't know if a rollout will be triggered by the change. I'll check the code and maybe also test it with our fork.
In our case with the EtcdLearnerMode we don't even need a rollout. The next rollout will enable the feature gate and it is also only necessary for new nodes.

@tobiasgiese
Copy link
Member Author

tobiasgiese commented Jan 24, 2024

@sbueringer so I just tested the mutation of the kubeadm feature gates by deleting the validation webhook capi-kubeadm-control-plane-validating-webhook-configuration.
After adding a kubeadm feature gate the KCP controller will trigger a rolling update immediately. So I think this is exactly what we are trying to achieve, right?

Example

Patch does not work

❯ k -n c42pc004 get kcp
NAME                     CLUSTER    INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE   VERSION
c42pc004-control-plane   c42pc004   true          true                   1          1       1         0             13h   v1.27.9
❯ kubectl -n c42pc004 patch kcp c42pc004-control-plane --type='merge' -p '{"spec":{"kubeadmConfigSpec":{"clusterConfiguration":{"featureGates":{"EtcdLearnerMode": true}}}}}'
The KubeadmControlPlane "c42pc004-control-plane" is invalid: spec.kubeadmConfigSpec.clusterConfiguration.featureGates.EtcdLearnerMode: Forbidden: cannot be modified

Remove the validation webhook

❯ k delete validatingwebhookconfigurations capi-kubeadm-control-plane-validating-webhook-configuration
validatingwebhookconfiguration.admissionregistration.k8s.io "capi-kubeadm-control-plane-validating-webhook-configuration" deleted

Patch the KCP

❯ kubectl -n c42pc004 patch kcp c42pc004-control-plane --type='merge' -p '{"spec":{"kubeadmConfigSpec":{"clusterConfiguration":{"featureGates":{"EtcdLearnerMode": true}}}}}'
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/c42pc004-control-plane patched
❯ k -n c42pc004 get kcp
NAME                     CLUSTER    INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE   VERSION
c42pc004-control-plane   c42pc004   true          true                   2          1       1         1             13h   v1.27.9

And to verify that the feature gate is working:

Jan 24 17:05:34 c42pc004-control-plane-7646d458b6-fjkcn cloud-init[658]: [2024-01-24 17:05:34] I0124 17:05:34.688405    3391 etcd.go:416] [etcd] Adding etcd member as learner: 68747470733a2f2f3139322e3136382e302e37393a32333830
Jan 24 17:05:34 c42pc004-control-plane-7646d458b6-fjkcn cloud-init[658]: [2024-01-24 17:05:34] I0124 17:05:34.781981    3391 etcd.go:506] [etcd] Promoting a learner as a voting member: 8c3d38ccb8d86480

@sbueringer
Copy link
Member

Sounds good. If not strictly required, I would like to avoid having handling for specific kubeadm feature gates in CAPI (eventually the feature gate will go away anyway)

@neolit123
Copy link
Member

neolit123 commented Jan 29, 2024

trying to understand what are the implications.

does this mean that the KCP controller will persist an FG in kube-system/kubeadm-config.ClusterConfiguration ?
on upgrade in a later version if kubeadm join then downloads the ClusterConfiguration and finds a legacy FG, we need to make sure it does not fail. i forgot what's the existing behavior in kubeadm / need to check.

@sbueringer
Copy link
Member

Hm not sure. The proposed change here is mostly about changing the webhook. Not sure what the implications are regarding the ConfigMap

@neolit123
Copy link
Member

kube-system/kubeadm-config.ClusterConfiguration is a versioned API.

thus when converting to internal type a validation will trigger:
https://github.com/kubernetes/kubernetes/blob/0e945e5cec50028bf5ad00c19a710dfc50ddec9f/cmd/kubeadm/app/apis/kubeadm/validation/validation.go#L612

i don't recall instances where in kubeadm we manually removed a FG from the ClusterConfiguration CM, during upgrade.
which means we likely left this action to the user.

in CAPI, on kubeadm join the CM is downloaded, converted to internal type and kubeadm join can fail. this means the CAPI admin must manually remove a legacy FG from kubeadm's ClusterConfiguration if they want the cluster to allow upgrade or machine scale.

@fabriziopandini
Copy link
Member

/triage approved

+1 for the change
As @neolit123 noted out, if we make some field mutable and they are expected to be persisted in the kubeadm config map (or in some other kubeadm managed resource), we should make sure this happens as well.

We already have several examples of this in https://github.com/fabriziopandini/cluster-api/blob/67c222eaaddbee7c48f5514d9f1d86c3c086069d/controlplane/kubeadm/internal/controllers/upgrade.go#L57-L123 (as well as the framework for managing kubeadm config versions)

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: The label(s) triage/approved cannot be applied, because the repository doesn't have them.

In response to this:

/triage approved

+1 for the change
As @neolit123 noted out, if we make some field mutable and they are expected to be persisted in the kubeadm config map (or in some other kubeadm managed resource), we should make sure this happens as well.

We already have several examples of this in https://github.com/fabriziopandini/cluster-api/blob/67c222eaaddbee7c48f5514d9f1d86c3c086069d/controlplane/kubeadm/internal/controllers/upgrade.go#L57-L123 (as well as the framework for managing kubeadm config versions)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fabriziopandini
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 29, 2024
@adityabhatia
Copy link
Contributor

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants