Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-router breaks iptables beacause it insists on using iptables table "filter" #1588

Closed
simonfelding opened this issue Dec 13, 2023 · 8 comments
Labels

Comments

@simonfelding
Copy link

simonfelding commented Dec 13, 2023

What happened?
When I start kube-router, it creates the table "filter" and adds rules to it. This leads to the error:
iptables v1.8.5 (nf_tables): table 'filter' is incompatible, use 'nft' tool. when I use iptables -L.

It's entirely avoidable by just not using the "filter" table when the nf_tables backend is used, as nftables doesn't put special meaning to table names.

What did you expect to happen?
I expected kube-router to work without breaking iptables.

How can we reproduce the behavior you experienced?
Steps to reproduce the behavior:

  1. install oracle linux 8
  2. download k0s
  3. k0s install controller --single
  4. watch iptables -L

** System Information (please complete the following information):**

  • Kube-Router Version (kube-router --version): 1.6.0
  • Kube-Router Parameters: default
  • Kubernetes Version (kubectl version) : v1.28.4
  • Cloud Type: on premise
  • Kubernetes Deployment Type: k0s
  • Kube-Router Deployment Type: [e.g. DaemonSet, System Service]
  • Cluster Size: 1 node
@simonfelding
Copy link
Author

simonfelding commented Dec 13, 2023

I realize the latest version is kube-router v.2.0.0, but I looked through the kube-router code and didn't find any changes that would have solved this.

@aauren
Copy link
Collaborator

aauren commented Dec 13, 2023

Hey @simonfelding

First off, I'm not sure if you intended this or not, but your issue description comes off a bit abrasive. The sentiment seems to be that we insisted on breaking compatibility for your use-case. While that may happen from time to time accidentally, we certainly never intend to make design mistakes in kube-router and we would appreciate if, in the future you would word your issue descriptions a bit more carefully so as to give us, and other open source projects, the benefit of the doubt in this respect.

In this case, we are not able to use any table other than the filter table because at its heart kube-router uses iptables the same as you're trying to do on your node. The incompatibility that you're experiencing is because the iptables that is included in the kube-router container by k0s is using iptables in legacy mode (https://wiki.nftables.org/wiki-nftables/index.php/Legacy_xtables_tools). You can see this by entering the kube-router container of k0s and looking at how it is linked:

k0s:~# iptables --version
iptables v1.8.9 (nf_tables)
k0s:~# which iptables
/sbin/iptables
k0s:~# ls -l /sbin/iptables
lrwxrwxrwx    1 root     root            23 Dec 13 20:42 /sbin/iptables -> /sbin/xtables-nft-multi

In this case, the fact that iptables points to xtables-nft-multi, along with (nf_tables) indicates that it is using iptables in legacy mode. When in this mode, iptables uses the command structure of iptables, but uses nftables as a backend in the kernel.

What is most likely happening is that the older version iptables user-space tooling that is included in Oracle Linux 8 is likely conflicting with the newer version that is bundled as part of k0s. Unfortunately, the user-space tooling for iptables often breaks compatibility between different versions of the tooling even on bugfix release versions. This is a problem that the kube-router project has encountered frequently enough, that we now have this indicated in the requirements of our user documentation:

Additionally, when run in daemonset mode, it is highly recommended that you keep netfilter related userspace host tooling like iptables, ipset, and ipvsadm in sync with the versions that are distributed by Alpine inside the kube-router container. This will help avoid conflicts that can potentially arise when both the host's userspace and kube-router's userspace tooling modifies netfilter kernel definitions. See: #1370 for more information.

The linked issue above will give you another example of something similar happening to another user.

Unfortunately, in this respect there is nothing that kube-router can do. We've reached out upstream to see if the netfilter project will implement fixes for these compatibility breaks: https://bugzilla.netfilter.org/show_bug.cgi?id=1632 but there hasn't yet been a whole lot of progress on that issue. At the end of the day, both iptables and nftables were created before container use-cases really materialized and they didn't really architect for two different versions of their user-space tooling modifying the same kernel constructs.

@simonfelding
Copy link
Author

simonfelding commented Dec 14, 2023

Hey @aauren - so sorry! Thank you for the thorough response. I did not mean to be abrasive or disrespectful and I didn't mean to make it sound like you intentionally introduced bugs. I was just trying to be succinct and helpful - I did a lot of research prior to posting the issue, including skimming through the kube-router code and doing all sorts of experiments. If I was more familiar with the code base I would have posted a pull request instead, but some people find that impolite too. I have deep respect for open source projects and would love to contribute and appreciate the work you put in it. I hope this is clear now!

I did see the other issues, although I wasnt aware of the netfilter issue.

Anyway, you're absolutely right - I did a lot of testing and comparing, and it seems to be purely because of the iptables version bundled with alpine, not because of kube-router. I tried entering the kube-router pod and doing iptables -L -t filter and doing iptables -L -t filter from my OS, as well as iptables-save > $file from the pod and iptables-restore $file from the OS. The rules look exactly the same and import just fine; the error seems to occur exactly when kube-router writes the KUBE-POD-FW rules - the filter table is compatible until then.

So I was wondering why other CNIs I've used don't have this issue and I think it was interesting:

  • Kube-router uses the image alpine:latest and gets the latest version of iptables.(Dockerfile)
  • Calico uses the Red Hat Universal Base Image 8 which bundles a RHEL8/Oracle Linux 8 compatible iptables version 1.8.5: (Dockerfile, UBI_IMAGE)
  • Cilium builds iptables 1.8.8-1 from source with ubuntu 22.04, and bundles it. runs ubuntu:22.04 as base image (Dockerfile).

So that is one way to get iptables compatibility, but I doubt you're willing to change base images.

Any chance you'd be willing to use a slightly older version of iptables to support those of us who are forced to use RHEL-type distros as OS?

@aauren
Copy link
Collaborator

aauren commented Dec 14, 2023

Hey Simon. Thanks for your thoughtful reply!

We use Alpine because that is the most common base used by most open source container distributions. It helps that it's incredibly light weight. But it is also pretty ubiquitous within the community.

I think the number of Linux OS versions speaks to the number of different approaches to packaging. Everyone prioritizes different things. I think that the container universe represents that same range of differing opinions. Some people value long-term stability that is given by fixed release versions. Others value the features and bug-fixes that are present in rolling / bleeding edge distros.

In this case, I don't think that it's really the base image that causes the problem. We could move to a UBI image from RHEL, it would then work for people with older user-spaces, and it would break for people with newer user-spaces. This is actually EXACTLY what happened in #1370 . Because their OS was running iptables 1.8.8 and kube-router was running a slightly older version of Alpine, it carried iptables 1.8.7 and it broke their use-case.

In the end, until upstream comes up with some sort of in-kernel version checking for the user-space. Or guarantees backwards compatibility across, at the very least, bugfix releases, we'll always have a problem whether kube-router lags behind or moves forward. At least when we move forward, we keep better kernel compatibility and we are able to more quickly pick up bugfixes. For instance, look at the changelog between 1.8.8 and 1.8.9: https://www.netfilter.org/projects/iptables/files/changes-iptables-1.8.9.txt There are quite a bit of fixes in there that we'd be missing if we decided to pin versions.

@mrueg does more of the maintenance on the kube-router container and dependencies, so I'd be interested in hearing his opinion on this.

However, my current thoughts are that it would be relatively simple for you to pin the versions that you need of kube-router by creating your own version of the kube-router container that sits on a RHEL UBI base. This may even be beneficial to other members of the community that have similar problems with older user-spaces.

One final thought on Calico and Cilium. The reason why they use more conservative bases than we do is because both of them are much larger projects than kube-router, are more geared towards providing enterprise support, and are backed by companies who have many engineers that are paid to support the project. Unfortunately, kube-router never reached that status.

We don't offer paid support and as such aren't nearly as limited by an enterprise audience that primarily runs static long-term distros. That isn't to say, that we don't do what we can for enterprise use-cases, but I wouldn't really say that it's our primary audience. Additionally, both of them have invested heavily in BPF lately, and so they likely don't suffer as many issues with iptables user-space mismatches because they don't rely on iptables nearly as much as kube-router does.

@mrueg
Copy link
Collaborator

mrueg commented Dec 19, 2023

I agree with @aauren here, that this kind of request unfortunately is out of scope.

If you would be willing to send a PR that adds a RHEL/UBI-based Dockerfile and adds that to the release pipeline to publish a kube-router:v2.x.x-ubi8 or similarly named container, we should be able to help with that (with the disclaimer, that there's no support around this container and it might get dropped if it breaks).

@jnummelin
Copy link
Contributor

This is actually EXACTLY what happened in #1370 . Because their OS was running iptables 1.8.8 and kube-router was running a slightly older version of Alpine, it carried iptables 1.8.7 and it broke their use-case.

Wanted to chime in as OP for that linked issue and as one of the maintainers of k0s. 😄

For those exact backward/forward compatibility issues k0s has now moved to a pattern where we keep all the iptables versions in sync. So essentially k0s provides statically compiled binary on the host, for kubelet mainly, and we're re-build/re-package kube-router with a matching iptables version. We do the same for Calico too. Yes, as seen here this can still cause problems with "native" host level tools which like @aauren said is something we cannot really fix in either kube-router or in k0s unfortunately.

At the end of the day, both iptables and nftables were created before container use-cases really materialized and they didn't really architect for two different versions of their user-space tooling modifying the same kernel constructs.

This is IMO really the root of all these challenges and problems. And unfortunately they are not limited to only iptables, we've already encountered somewhat similar issues with ipset too. 😞

@simonfelding
Copy link
Author

Brilliant! Thank you @jnummelin!

@aauren
Copy link
Collaborator

aauren commented Jan 5, 2024

We're still open to a contribution as suggested by @mrueg in #1588 (comment)

However, unless someone from the community wants to contribute that and support it, I would say that the project is already as committed as we can be with the support of our existing container which should work with most recent distros.

The real fix for this would need to be upstream, so I would encourage anyone interested in seeing a real fix for this to work with the upstream netfilter project on a fix in the kernel and user-space tooling.

Closing this for now, as kube-router has done as much as we currently can to mitigate this issue.

@aauren aauren closed this as completed Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants