Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memorySwap: kubelet fails due to missing feature gate #8392

Closed
hakoerber opened this issue Jan 8, 2022 · 9 comments
Closed

memorySwap: kubelet fails due to missing feature gate #8392

hakoerber opened this issue Jan 8, 2022 · 9 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@hakoerber
Copy link
Contributor

Environment:

  • Cloud provider or hardware configuration: Hetzner VM

  • OS: CentOS 7

  • Version of Ansible: ansible 2.9.14

  • Version of Python: Python 3.8.10

Kubespray version: 52266406, latest master at time of opening this issue

Network plugin used: calico


I am using kubespray on a server with swap, so I have kubelet_fail_swap_on set to false. Since #8241, this also enables the alpha-stage memorySwap functionality of the kubelet (Link). Unfortunately, this fails (even on Kubernetes v1.23.1) due to a missing feature gate. The stdout of kubelet shows:

E0108 21:03:53.476082    5185 server.go:225] "Failed to validate kubelet configuration" err="invalid configuration: 
MemorySwap.SwapBehavior cannot be set when NodeSwap feature flag is disabled"

Note that the NodeSwap feature gate is disabled by default: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/

When I enable the feature gate explicitly, all is well:

kube_feature_gates:
- NodeSwap=true

I'm honestly not sure how to best continue here. Maybe a documentation change would already be enough, emphasizing to enable the feature gate when using kubelet_fail_swap_on: false. Otherwise, I think of the following:

  1. Enable the NodeSwap feature gate implicitly when setting kubelet_fail_swap_on: false. I am not sure how this can be sanely handled in ansible.
  2. Decouple the memorySwap kubelet configuration from kubelet_fail_swap_on, so it is possible to use swap without using the memorySwap functionality. This might be desirable when one does not want to use alpha-level features, but still use swap on the node itself. The new variable could be something like enable_k8s_node_swap_usage.

What do you think?

@hakoerber hakoerber added the kind/bug Categorizes issue or PR as related to a bug. label Jan 8, 2022
@cristicalin
Copy link
Contributor

Note that this is an alpha feature and it may suffer changes.

You have an example of how to set it up in the CI job: https://github.com/kubernetes-sigs/kubespray/blob/master/tests/files/packet_fedora35-calico-swap-selinux.yml

As discussed in #8241 this will remain as experimental support until the feature graduates to beta and we know how it will look like in a stable implementation so it can be documented.

@hakoerber
Copy link
Contributor Author

You have an example of how to set it up in the CI job: https://github.com/kubernetes-sigs/kubespray/blob/master/tests/files/packet_fedora35-calico-swap-selinux.yml

Yes, this is the configuration that I'm currently using and that works for me. The issue is that I cannot opt out of this feature without setting kubelet_fail_swap_on: False, which means that I cannot use swap at all on the node. So there is now way to use swap but not use this alpha feature.

This can be acceptable, but I think it would be clearer to decouple this feature from the kubelet_fail_swap_on setting.

@cristicalin
Copy link
Contributor

Can you detail the use-case of allowing the node to have swap but not allowing the kubelet to track the usage?

Unless you request swap for your pods this feature should not have an impact.

@hakoerber
Copy link
Contributor Author

I personally don't have one 😄

I was just surprised by the failure of the kubelet after the update, as kubelet_fail_swap_on now requires another setting (the feature gate) to work.

I guess this implicit coupling of settings should either:

  • Be broken up (by having different settings for each)
  • Be documented
  • Be automatically applied, by enabling the feature gate automatically when enabling swap

@oomichi
Copy link
Contributor

oomichi commented Jan 12, 2022

I personally don't have one 😄

I was just surprised by the failure of the kubelet after the update, as kubelet_fail_swap_on now requires another setting (the feature gate) to work.

I guess this implicit coupling of settings should either:

* Be broken up (by having different settings for each)
* Be documented
* Be automatically applied, by enabling the feature gate automatically when enabling swap

I feel Be documented is necessary at least.
Be automatically applied, by enabling the feature gate automatically when enabling swap also is reasonable from user viewpoint. But that makes dependency on alpha feature of Kubernetes which can be changed.
That would cause the maintenance burden for taking care of which Kubernetes version is for the alpha feature configuration, which one is for the beta feature configuration on Kubespray side.
I think that is the reason why @cristicalin didn't make the automated configuration.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 12, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 12, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants