-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod to Pod Communcation severely degraded in 4.11 on vSphere #1550
Comments
What is the VMware hardware version of the VMs? |
They are: |
Is it reproducible in 4.12? |
Yes we upgraded a cluster to 4.12 and were able to reproduce it. |
Right, so its possibly kernel module or OVN have regressed. Could you check if node-to-node performance has degraded too? If yes, its probably a Fedora / kernel regression |
Node to node performance is good, I tested on the nodes themselves using the toolbox
|
In that case its probably OVN. I wonder if we could confirm its OKD specific? |
We have a few OCP cluster at 4.11 and I haven't been able to reproduce the problem in them. |
When you jump from 4.11 (4.11.0-0.okd-2022-08-20-022919 or so) to 4.12.0-0.okd-2023-02-18-033438 or newer you should see better network performance. |
That is not the case for us, I have upgraded to version 4.12.0-0.okd-2023-04-01-051724 and am still seeing the same issues. I still can't run any test across pods without them timing out. |
Do we need to open an issue in the ovn-kubernetes repository? |
I think you should open an issue in the repository https://github.com/openshift/ovn-kubernetes/. |
We re-deployed the cluster with version 4.10.0-0.okd-2022-07-09-073606 on the same hardware and the issue went away. There is clearly an issue with 4.11 and above. Benchmark results are below:
|
I tested this on a cluster using openshiftSDN, deployed version 4.10 upgraded to 4.11 and was able to replicate the issue. So, it's not specific to OVN. |
So, the issue #1563 for 4.11.0-0.okd-2022-12-02-145640 is reproducible that case |
|
Yes, for #1563, I checked that all the nodes (except the remote worker nodes) master, storage and worker are already on the same esx host (ESXi 6.7 and later (VM version 14)) |
I will try to reproduce here but it would be good to know if I am replicating to what was already provisioned. Again can I get ESXi version and build numbers, FCOS kernel version - please be specific. Remember vSphere 6.x is EOL and some older versions have issues with VXLAN w/ESXi and kernel drivers. |
In our case the ESxi version info is in the initial post Linux version 6.0.18-200.fc36.x86_64 (mockbuild@bkernel01.iad2.fedoraproject.org) (gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4), GNU ld version 2.37-37.fc36) #1 SMP PREEMPT_DYNAMIC Sat Jan 7 17:08:48 UTC 2023 vSphere: |
okd version: 4.12.0-0.okd-2023-04-01-051724 ESXi: VMware ESXi, 8.0.0, 20513097 client
server
|
Not really seeing a problem here |
I can't even get iperf tests to run when the pods are on hosts that are on different esx hosts, it just times out. |
In my case I can't even get the console up anymore. I've reproduced it over and over and now it looks like someone else has also. Not sure what to do other than stay at 4.10. |
@MattPOlson based on your previous comments this looks to me like MTU or something with VXLAN. Have you checked all the virtual switches, physical device MTU? And is it correct in stating with all the guests reside together there is no performance issue? Is it a specific ESXi host that is ok? |
We've tried MTU settings, set them to match the host. But why would that affect 4.11 and not 4.10? I can spin up a cluster on 4.10 and it works perfectly, upgrade it to 4.11 and change nothing else and it breaks. And yea if all the nodes reside together there is no issue. |
Don't you find that odd? If the problem occurs when packets are leaving the ESXi host then I would suspect something physical. I can't comment to why the version would make a difference but I can't reproduce. |
Agree but I also find it odd that upgrading to 4.11 breaks it and someone else was able to reproduce it. To me that feels like it's not something specific to our environment. |
We do test OCP and OKD in multiple different vSphere environments and haven't seen this issue. Maybe you and @imdmahajankanika stumbled into the same problem? The question is what is the commonality. |
Right, that is the question. Interestingly we have a few OCP clusters running at 4.11 on the exact same hardware and don't see the issue there. |
After upgrading our clusters from
on the master nodes. I have a feeling that we are looking at the very old bug: openshift/machine-config-operator#2482 Could you check the state of tunnel offloading on your nodes with |
That shouldn't be an issue as we haven't removed the workaround And it is in specific older ESXi versions, if you are hitting the VXLAN offloading bug you need to upgrade your hosts. |
I thnk I figured it out, your workaround isn't working anymore, looks like there is a permission issue and the NetworkManager-dispatcher.service is failing to apply the scripts in the directory /etc/NetworkManager/dispatcher.d/99-vsphere-disable-tx-udp-tnl
Disabling tunnel offloading seems to fix the problem. I'm looking into why that script is now getting permission denied errors. |
VMware only updated their release notes for 6.7 that resolves this issue. I am unsure what 7.x build fixes it. Certainly need to figure out the permission issue, which is strange. I would figure other dispatch scripts would be breaking too. |
It looks like this bug was already reported and fixed, but I'm definitely still seeing the issue in our environment |
Perhaps its also #1475? |
I think I figured something out, the script the service rhcos-selinux-policy-upgrade.service executes to reload selinux is never running because its looking for RHEL_VERSION in /usr/lib/os-release That exists in Red Hat Enterprise Linux CoreOS but not in Fedora, therefore it's never hitting the line that calls semodule -B
|
@jcpowermac We are running on ESXi 7u3 and this issue should be fixed. Maybe this has come up again in newer versions of |
@bo0ts I would suggest opening a support request with vmware. They own both aspects of this, the linux kernel driver [0] and ESXi. [0] - https://github.com/torvalds/linux/commits/master/drivers/net/vmxnet3 |
Over in the Slack thread there is also discussion about why and where this occurs. We don't see that problem on our clusters. But perhaps we simply don't do enough intra cluster communications? Can you give us an easy test how I can verify if we indeed do not have the problem or just don't see it? |
The easiest way is to deploy an iPerf client on one node and an iPerf server on another node, then run test between them to check performance. |
Ok, I guess something like that: https://github.com/InfuseAI/k8s-iperf |
I've had good luck with this one: |
cat Dockerfile| oc new-build --name perf -D - then created a deployment for both client and server then just found this on a blog post somewhere ;-) |
Hello! In my case, just by executing |
Disable is done via a networkmanager dispatch script, so that kinda makes sense. Wonder why it doesn't work the first time. |
When I checked initially via "systemctl status NetworkManager-dispatcher.service", I found two types of errors
|
I also see the failed access in my environment. In my opinion, it is due to SeLinux.
|
Hello, Did you try with or
|
I had tried I can confirm that the offload parameters are also set in my environment.
I ran network performance tests using iperf before and after changing the offload parameters. I used
The difference between the tests is tiny. The network speed between two pods on different nodes and two VMs is very large (between two VMs the speed is around 7x faster), but according to my current knowledge this is due to OVN. |
Running this command fixes the issues for us:
With offload on communication between nodes on different nodes is really bad. I have also found that if we upgrade the vSphere Distributed Switch to version 7.0.3 the problem goes away, speeds are normal with offload on. |
Describe the bug
We run okd in a vSphere environment with the below configuration:
After upgrading the cluster from a 4.10.x version to anything above 4.11.x pod to pod communication is severely degraded where the nodes that the pods run on are hosted on different esx hosts. We ran a benchmark test on the cluster before the upgrade with the below results:
After upgrading to version 4.11.0-0.okd-2023-01-14-152430 the latency between the pods is so high the benchmark test, qperf test, and iperf test all timeout and fail to run. This is the result of curling the network check pod across nodes, it takes close to 30 seconds.
We have been able to reproduce this issue consistently on multiple different clusters.
Version
4.11.0-0.okd-2023-01-14-152430
IPI on vSphere
How reproducible
Upgrade or install a 4.11.x or higher version of OKD and observe the latency.
The text was updated successfully, but these errors were encountered: