Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

May 2021 security updates to Linux kernel restricts access to conntrack sysctls, breaking kube-proxy in CAPD clusters #4712

Closed
randomvariable opened this issue Jun 2, 2021 · 9 comments · Fixed by #4717
Assignees
Labels
area/bootstrap Issues or PRs related to bootstrap providers area/control-plane Issues or PRs related to control-plane lifecycle management kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor.

Comments

@randomvariable
Copy link
Member

randomvariable commented Jun 2, 2021

What steps did you take and what happened:
[A clear and concise description on how to REPRODUCE the bug.]

Create a kind cluster
Provision a CAPD cluster
Cluster gets created but kube-proxy is in CrashLoopBackoff with

I0602 14:00:54.955938       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 786432
F0602 14:00:54.956191       1 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied

What did you expect to happen:
Cluster to work

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

This change is caused by a changeset in the Linux kernel torvalds/linux@671c54e such that non-init network namespaces are not allowed to set the connection tracking sysctls. If kube-proxy's conntrack.max-per-core is set to anything other than 0 (including null), then it will attempt to set the sysctl.

I first attempted to set extraArgs for kube-proxy, but unfortunately this value gets overridden by the configmap that explicitly sets the value to null in the component config.

kind's workaround directly passes in a kube-proxy component config to kubeadm to set the value.

Affected Linux kernel versions include:
5.12
5.11
5.10
4.14
and 4.9 according to https://www.spinics.net/lists/stable/msg466347.html

Possible solutions:

  1. Allow configuration of kube-proxy component config in KCP
    Pros: Replicates workaround from kind
    Con: How do we handle importing/copying kube-proxy component config types from k/k

  2. Add non-init namespace detection (if feasible) & kernel version detection logic to kube-proxy and skip setting conntrack
    Pros: Probably the most elegant solution if it can be done. No changes required to CAPI.
    Cons: Probably not backportable.

  3. Have CAPD patch the kube-proxy configmap as a special case.
    Pros: It will work
    Cons: Hackiest solution?

Environment:

  • Cluster-api version: main branch cd3a694deac89d5ebeb888307deaa61487207aa0
  • Minikube/KIND version: v0.11
  • Kubernetes version: (use kubectl version): v1.21.1
  • OS (e.g. from /etc/os-release): Fedora 34, Linux 5.12.8-300.fc34.x86_64

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

/area control-plane
/area bootstrap

cc @fabriziopandini

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. area/control-plane Issues or PRs related to control-plane lifecycle management area/bootstrap Issues or PRs related to bootstrap providers labels Jun 2, 2021
@elmiko
Copy link
Contributor

elmiko commented Jun 2, 2021

this affected me as well. just for the record, i tried the same process on a fedora 34 install running kernel 5.11.12 and i do not see the sysctl error.

@neolit123
Copy link
Member

neolit123 commented Jun 2, 2021 via email

@neolit123
Copy link
Member

@aojea do you happen to know?

@randomvariable
Copy link
Member Author

Can something be done on the node images to mitigate instead of passing a
component config?

If you're thinking "can we set the host value before spinning up", unfortunately not, because the kube-proxy calculates the max conn sysctl it wants to set based on available memory.

@randomvariable
Copy link
Member Author

Do sig network know about this and are we going to patch kube proxy?

I pinged @andrewsykim and @jayunit100 for an opinion on Option 2.

@randomvariable randomvariable changed the title Linux 5.12 restricts access to conntrack sysctls, breaking kube-proxy in CAPD clusters May 2021 security updates to Linux kernel restricts access to conntrack sysctls, breaking kube-proxy in CAPD clusters Jun 2, 2021
@fabriziopandini
Copy link
Member

WRT to the proposed options,

  1. requires some design because it could impact UX
  2. (leave the opinion to the Kubeadm-proxy experts)
  3. as of today the most viable solution, it mimics what kind is doing.

@aojea
Copy link

aojea commented Jun 2, 2021

2 (leave the opinion to the Kubeadm-proxy experts)
Pros: Probably the most elegant solution if it can be done. No changes required to CAPI.

Are CAPD cluster production cluster or just for testing?
putting a workaround in a "core" component of kubernetes, that also is kind of changing a default value just for a test environment seems a bit excessive ... unless in the future containerised cluster are the norm, but then I agree it should be revisited

  1. as of today the most viable solution, it mimics what kind is doing.

I agree here, CAPD controls the environment so it should know in advance what is the best configuration for each case, and this allows to iterate in case we need to revisit it, the change is minimal

@randomvariable
Copy link
Member Author

Looks like the consensus is to do it in CAPD.

Chatting to @andrewsykim , I may speculatively open a PR to kube-proxy to make setting the sysctls non-fatal when it gets permission denied as this will help any nested running of kube-proxy.

@randomvariable
Copy link
Member Author

/lifecycle active
/assign

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jun 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bootstrap Issues or PRs related to bootstrap providers area/control-plane Issues or PRs related to control-plane lifecycle management kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants