-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
60+ seconds stuck when call a http service pod #1679
Comments
Hi @rkonfj thanks for reporting this. |
@rbrtbnfgl pod to pod via service ip may works fine, but node to pod via service ip will be stuck |
I understand your issue but I still didn't get the same result. |
Could you share the content of |
@rbrtbnfgl the content of this file is consistent on each node
|
@rbrtbnfgl let's confirm whether there is the nat rules formed by
|
Ok now it's clear. This bug is only happening on some kernel versions, that's why it wasn't happening on my setup. I'll try to update the iptables rules. |
Thanks @rkonfj to find that. The fix will also speed up the MASQUERADE process. |
it works well |
Thanks I'll update the code to add this additional rule. |
@rkonfj we believe your kernel still has a vxlan bug which makes you see this problem when double natting. We can avoid it by not double-natting as @rbrtbnfgl suggests. But just to verify, with the original flannel iptable rules and thus double-natting, could you execute in your nodes:
And then try again. That should remove the vxlan bug from the equation and thus it should work, even if having double-natting |
yes, it works |
I've been banging my head on a possibly related issue. I built a Kubernetes cluster from four Ubuntu 22.04 servers. Most stuff is working fine, but a few things that seem to involve leaving or entering the pod network are messed up. I had to run
I suspected a NAT/masquerading issue and noticed that the first Here's what it looked like before I changed anything:
Annoyingly, I have twice tried adding the rule at the end and removing it from the beginning, but something (Flannel, I suppose) keeps recreating it at the top. I do this:
Then I somehow wind up with this:
FWIW, my Any thoughts would be greatly appreciated. :-) |
It shouldn't be the same issue. This issue was related to the UDP checksum that should be solved on v0.20.2. |
Thank you for taking time to reply, @rbrtbnfgl. (And thank you for your work on Flannel!) At work now but will document my set-up as best I can tonight in a new issue. I suspected the first rule in my
|
You can increase the verbosity of iptables output if you use |
Thanks! That's cool, @rbrtbnfgl. I used
The |
This issue should be fixed with Flannel v0.20.2. |
when
flanneld
version upgrading to v0.20.1 andcurl
http service pod in different node viaClusterIP
will stuck 60+ seconds.Expected Behavior
no stuck
Current Behavior
stuck 60+ seconds
Possible Solution
eh... may be caused by
double-NAT
, i have no ideaSteps to Reproduce (for bugs)
it will stuck curl when
nat POSTROUTING
order like this:it works fine like this:
Context
this pr(kubernetes/kubernetes#92035) looks like to solve this issue, but I still have this problem when I use flanneld v0.20.1
Your Environment
The text was updated successfully, but these errors were encountered: