-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable tx and rx offloading on VXLAN interfaces #1282
Conversation
Fixes #1282 |
Seems that some kernel versions have issues with VXLAN checksum offloading, causing that flannel stop to work on some scenarios where the traffic is encapsulated, but the checksum is wrong and is discarded by the receiver. A known workaround that works is disabling offloading on the flannel interface: ethtool --offload flannel.1 rx off tx off Flannel disables tx and rx offloading on VXLAN interfaces.
@Capitrium seems that other people that has tested the patch has it working, |
ping @rajatchopra :-) |
@aojea any idea on which kernel versions are affected? |
no idea, but this issue created an explosion of issues opened against kubernetes and related projects, it was also discussed in the sig-network mailing list https://groups.google.com/d/msg/kubernetes-sig-network/JxkTLd4M8WM/EW8O1E0PAgAJ |
@aojea Yeah, looks like it was an issue with configuration or stale pods on my end - rebuilt and redeployed, seems like it's working now. I was still seeing networking issues with pods on one node after deploying the patch and had to kill the node, but I was doing a fair amount of testing with different kube-proxy/kube-router/flannel versions and probably broke something else in the process. Most existing nodes and all new nodes are working properly. 👍 |
Very cool @aojea. This create a lot of discussions in Kops project also. Really happy to see that there is a solution :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aojea I am not sure if we should disable rx/tx checksum always. A config parameter maybe? This is a temporary hack, right? |
@dcbw posted more info here kubernetes/kubernetes#88986 (comment)
@rajatchopra I've considered that, but
and despite we make it configurable we have to disable offloading by default or people will keep hitting the bug ... is it worth the effort? I personally don't think nobody is going to enable it afterward and this will protect flannel from future breakages or odd scenarios like the described in the weave issue 🤷 |
Kernel 3.10.0 that's used in RHEL7/CentOS 7 with minor variations (but I've checked and this happens in CentOS from the 7.0 to 7.7 and with the latest RH provided kernel) |
@rajatchopra any plans for a bugfix release containing this? |
The "proper" fix for this was accepted by The comment says that |
Oddly enough, if I run a vxlan using a binary flannel file, it will work, whereas if I deploy it using kubeadm, the SVC access will timeout in 63 seconds |
I did a control group and found the cause of the problem,the os is
The control group above proved that flannel was not the source of the problem, so I made a second control group:
so,If the kube-proxy is running in pod, It will trigger the kernel bug,I found this by comparing submissions on github FROM k8s.gcr.io/kube-proxy:v1.17.5
RUN rm -f /usr/sbin/iptables &&
clean-install iptables after I set the images use the hack images , it never causing 63 second delays in vxlan mode. @danwinship Please take a look at why iptables is not installed in the docker image |
@zhangguanzhang impressive work iptables is installed in that image, however, due to another bug, it has to use the latest version and use an script to detect if it should use the nft or legacy backend
so, do you think that the iptables version is the trigger? |
I do. The recent release of kubernetes 1.16.10 is also affecting 1.16 for the first time so I studied the diffs yesterday. The only thing I see is that there is an iptables container build and the version was bumped from 11.x to 12.x, which 1.17 also did. |
The problem, in terms of docker image modification, is with iptables |
The iptables packaging bump triggered the [EDIT: |
Yet the workaround often suggested seems to be disabling offload, and it does have an effect on the issue |
Stable kernels 5.6.13, 5.4.41, 4.19.123, 4.14.181 and later have the checksum patch included. |
@MarkRose Which jkernels in Centos? |
Trying to test a flatcar image w/ one of those kernels, but issues @ quay making for a fun day... |
Just going to summarise what I know about this for sure now:
Which says, that while it's related to vxlan offload in some manner (because the workaround seems to work...), it is not that exact kernel bug (because it makes no difference when tested). Looking to understand this random-fully thing now. |
@jhohertz and just to confirm, you're running RHEL/CentOS 7 with kernel 3.10.something? |
Not here, I have been using coreos/flatcar container linux |
@danwinship just as a follow up from our sig-network meeting, I'll post the tests results from disabling --random-fully scenarios in kube-proxy and flannel, and also the generated rules for each scenario. This is the same environment I could use to confirm that disabling tx offload solves (CentOS 7 + Flannel 0.11), but in this case I'm enabling tx offload again in everything:
Scenario 1 - Both kube-proxy and flannel with MASQUERADE rules created with --random-fully NodePort: Exactly 63s IPTables rules containing --random-fully:
Scenario 2 - Original kube-proxy with random-fully and flannel recompiled without it IPTables rules containing --random-fully:
Scenario 3 - kube-proxy without random-fully and original flannel with random-fully enabled IPTables rules containing --random-fully:
Scenario 4 - kube-proxy and flannel without random-fully IPTables rules containing --random-fully: NONE This way, we can almost for sure say that the insertion of --random-fully in the MASQUERADE rules triggered an already existing Kernel Bug in vxlan part :) I'll post this also in the original issue at k/k repo, and please let me know if I can help with any further test |
Also I've seen some discussion here about iptables-legacy x nft and this is starting to hit CentOS 8 users: kubernetes/kubernetes#91331 I was going to ask what's the effort and gain of putting the iptables-wrapper to work here but as noted by the README it seems it wouldn't solve the CentOS 8 case :/ |
/close This is no longer needed thanks to @danwinship 👏 |
@aojea will kubernetes/kubernetes#92035 be cherry-picked to older k8s releases? |
it should be backported to the supported releases, just ping the author in the PR if he can do it, if he can not the process is described here https://github.com/kubernetes/community/blob/master/contributors/devel/sig-release/cherry-picks.md |
Thanks @aojea and @danwinship 😄 |
Thanks, I used the latest version v0.21.5 plus this PR patch to successfully fix the 63-second connection delay problem. |
Description
Seems that some kernel versions have issues with VXLAN checksum
offloading, causing that flannel stop to work on some scenarios
where the traffic is encapsulated, but the checksum is wrong and is
discarded by the receiver.
A known workaround that works is disabling offloading on the flannel
interface:
ethtool --offload flannel.1 rx off tx off
This PR force to disable it always on VXLAN interfaces
Todos
Release Note