Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

60+ seconds stuck when call a http service pod #1679

Closed
rkonfj opened this issue Nov 21, 2022 · 18 comments
Closed

60+ seconds stuck when call a http service pod #1679

rkonfj opened this issue Nov 21, 2022 · 18 comments
Assignees

Comments

@rkonfj
Copy link

rkonfj commented Nov 21, 2022

when flanneld version upgrading to v0.20.1 and curl http service pod in different node via ClusterIP will stuck 60+ seconds.

Expected Behavior

no stuck

Current Behavior

stuck 60+ seconds

Possible Solution

eh... may be caused by double-NAT, i have no idea

Steps to Reproduce (for bugs)

it will stuck curl when nat POSTROUTING order like this:

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
FLANNEL-POSTRTG  all  --  anywhere             anywhere             /* flanneld masq */
KUBE-POSTROUTING  all  --  anywhere             anywhere             /* kubernetes postrouting rules */

it works fine like this:

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
KUBE-POSTROUTING  all  --  anywhere             anywhere             /* kubernetes postrouting rules */
FLANNEL-POSTRTG  all  --  anywhere             anywhere             /* flanneld masq */

Context

this pr(kubernetes/kubernetes#92035) looks like to solve this issue, but I still have this problem when I use flanneld v0.20.1

Your Environment

  • Flannel version: v0.20.1
  • Backend used (e.g. vxlan or udp): vxlan
  • Etcd version: 3.5.3
  • Kubernetes version (if used): v1.25.4
  • Operating System and version: Archlinux (kernel version 6.0.8)
  • Link to your project (optional):
@rbrtbnfgl rbrtbnfgl self-assigned this Nov 22, 2022
@rbrtbnfgl
Copy link
Contributor

rbrtbnfgl commented Nov 22, 2022

Hi @rkonfj thanks for reporting this.
I tried to setup a cluster with flannel and two pods deployed on two different nodes (one client and the other with nginx).
If I used wget on the exposed service address from the client pod the IPs are correctly translated by Kube-proxy and the connection works fine.
Could you give more info about your setup?
Can you check the output of kubectl get service -A and kubectl get service -A?

@rkonfj
Copy link
Author

rkonfj commented Nov 22, 2022

@rbrtbnfgl pod to pod via service ip may works fine, but node to pod via service ip will be stuck

@rbrtbnfgl
Copy link
Contributor

I understand your issue but I still didn't get the same result.
I deployed the nginx pod on a node and I tried curl from a different node to the one where the pod was deployed and it worked.

@rbrtbnfgl
Copy link
Contributor

Could you share the content of /etc/cni/net.d/10-flannel.conflist?

@rkonfj
Copy link
Author

rkonfj commented Nov 23, 2022

@rbrtbnfgl the content of this file is consistent on each node

# cat /etc/cni/net.d/10-flannel.conflist
{
  "name": "cbr0",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}

@rkonfj
Copy link
Author

rkonfj commented Nov 23, 2022

@rbrtbnfgl let's confirm whether there is double-NAT

the nat rules formed by flanneld-v0.20.1 and kube-proxy-v1.25.4 are as follows:

# iptables -t nat -L POSTROUTING # expand from `FLANNEL-POSTRTG`' and `KUBE-POSTROUTING` 
Chain POSTROUTING (policy ACCEPT)
num  target     prot opt source
1    RETURN     all  --  10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
2    MASQUERADE  all  --  10.244.0.0/16       !base-address.mcast.net/4  /* flanneld masq */ random-fully
3    RETURN     all  -- !10.244.0.0/16        k1/24                /* flanneld masq */
4    MASQUERADE  all  -- !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully

1    MASQUERADE  all  --  anywhere             anywhere             /* Kubernetes endpoints dst ip:port, source ip for solving hairpin purpose */ match-set KUBE-LOOP-BACK dst,dst,src
2    RETURN     all  --  anywhere             anywhere             mark match ! 0x4000/0x4000
3    MARK       all  --  anywhere             anywhere             MARK xor 0x4000
4    MASQUERADE  all  --  anywhere             anywhere             /* kubernetes service traffic requiring SNAT */ random-fully

service ip -> pod ip, packets will NATed by FLANNEL-POSTRTG's rule 4 and KUBE-POSTROUTING's rule 4, double-NAT right ?

how can we only NAT once? possible solution 1 (unmark packets when MASQUERADED)

Chain POSTROUTING (policy ACCEPT)
num  target     prot opt source
1    RETURN     all  --  10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
2    MASQUERADE  all  --  10.244.0.0/16       !base-address.mcast.net/4  /* flanneld masq */ random-fully
3    RETURN     all  -- !10.244.0.0/16        k1/24                /* flanneld masq */
4    MASQUERADE  all  -- !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully

1    MARK       all  --  anywhere             anywhere             MARK xor 0x4000

1    MASQUERADE  all  --  anywhere             anywhere             /* Kubernetes endpoints dst ip:port, source ip for solving hairpin purpose */ match-set KUBE-LOOP-BACK dst,dst,src
2    RETURN     all  --  anywhere             anywhere             mark match ! 0x4000/0x4000
3    MARK       all  --  anywhere             anywhere             MARK xor 0x4000
4    MASQUERADE  all  --  anywhere             anywhere             /* kubernetes service traffic requiring SNAT */ random-fully

how can we only NAT once? possible solution 2 (make KUBE-POSTROUTING's rules precede FLANNEL-POSTRTG)

Chain POSTROUTING (policy ACCEPT)
num  target     prot opt source
1    MASQUERADE  all  --  anywhere             anywhere             /* Kubernetes endpoints dst ip:port, source ip for solving hairpin purpose */ match-set KUBE-LOOP-BACK dst,dst,src
2    RETURN     all  --  anywhere             anywhere             mark match ! 0x4000/0x4000
3    MARK       all  --  anywhere             anywhere             MARK xor 0x4000
4    MASQUERADE  all  --  anywhere             anywhere             /* kubernetes service traffic requiring SNAT */ random-fully

1    RETURN     all  --  10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
2    MASQUERADE  all  --  10.244.0.0/16       !base-address.mcast.net/4  /* flanneld masq */ random-fully
3    RETURN     all  -- !10.244.0.0/16        k1/24                /* flanneld masq */
4    MASQUERADE  all  -- !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully

@rbrtbnfgl
Copy link
Contributor

Ok now it's clear. This bug is only happening on some kernel versions, that's why it wasn't happening on my setup. I'll try to update the iptables rules.

@rbrtbnfgl
Copy link
Contributor

Thanks @rkonfj to find that. The fix will also speed up the MASQUERADE process.
Could you test on your setup if adding this iptables rule on the node will fix the issue?
iptables -t nat -I FLANNEL-POSTRTG -m mark --mark 0x4000/0x4000 -j RETURN

@rkonfj
Copy link
Author

rkonfj commented Nov 23, 2022

Thanks @rkonfj to find that. The fix will also speed up the MASQUERADE process. Could you test on your setup if adding this iptables rule on the node will fix the issue? iptables -t nat -I FLANNEL-POSTRTG -m mark --mark 0x4000/0x4000 -j RETURN

it works well

@rbrtbnfgl
Copy link
Contributor

Thanks I'll update the code to add this additional rule.

@manuelbuil
Copy link
Collaborator

@rkonfj we believe your kernel still has a vxlan bug which makes you see this problem when double natting. We can avoid it by not double-natting as @rbrtbnfgl suggests. But just to verify, with the original flannel iptable rules and thus double-natting, could you execute in your nodes:

sudo ethtool -K flannel.1 tx-checksum-ip-generic off

And then try again. That should remove the vxlan bug from the equation and thus it should work, even if having double-natting

@rkonfj
Copy link
Author

rkonfj commented Nov 28, 2022

@rkonfj we believe your kernel still has a vxlan bug which makes you see this problem when double natting. We can avoid it by not double-natting as @rbrtbnfgl suggests. But just to verify, with the original flannel iptable rules and thus double-natting, could you execute in your nodes:

sudo ethtool -K flannel.1 tx-checksum-ip-generic off

And then try again. That should remove the vxlan bug from the equation and thus it should work, even if having double-natting

yes, it works

@JamesLavin
Copy link

JamesLavin commented Jan 3, 2023

  • Flannel version: v0.20.2
  • Backend used (e.g. vxlan or udp): vxlan
  • Etcd version: 3.3 (I believe)
  • Kubernetes version (if used): v1.26.0
  • Operating System and version: Ubuntu 22.04 (kernel version 5.15.0-56-generic)

I've been banging my head on a possibly related issue. I built a Kubernetes cluster from four Ubuntu 22.04 servers. Most stuff is working fine, but a few things that seem to involve leaving or entering the pod network are messed up. I had to run metrics-server on hostNetwork to get it to work, and I get this error trying to create a Postgres cluster:

$ kubectl create -f my-cnpg-cluster.yaml
Error from server (InternalError): error when creating "my-cnpg-cluster.yaml": Internal error occurred: failed calling webhook "mcluster.kb.io": failed to call webhook: Post "https://cnpg-webhook-service.cnpg-system.svc:443/mutate-postgresql-cnpg-io-v1-cluster?timeout=10s": context deadline exceeded

I suspected a NAT/masquerading issue and noticed that the first FLANNEL-POSTRTG rule (below) seems to override all subsequent rules (though I'm not super knowledgeable about iptables and may be misinterpreting).

Here's what it looked like before I changed anything:

$ sudo iptables -t nat -L

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
FLANNEL-POSTRTG  all  --  anywhere             anywhere             /* flanneld masq */
KUBE-POSTROUTING  all  --  anywhere             anywhere             /* kubernetes postrouting rules */

Chain FLANNEL-POSTRTG (1 references)
target     prot opt source               destination
RETURN     all  --  anywhere             anywhere             /* flanneld masq */
RETURN     all  --  10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
MASQUERADE  all  --  10.244.0.0/16       !base-address.mcast.net/4  /* flanneld masq */ random-fully
RETURN     all  -- !10.244.0.0/16        williams/24          /* flanneld masq */
MASQUERADE  all  -- !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination
RETURN     all  --  anywhere             anywhere             mark match ! 0x4000/0x4000
MARK       all  --  anywhere             anywhere             MARK xor 0x4000
MASQUERADE  all  --  anywhere             anywhere             /* kubernetes service traffic requiring SNAT */ random-fully

Annoyingly, I have twice tried adding the rule at the end and removing it from the beginning, but something (Flannel, I suppose) keeps recreating it at the top. I do this:

sudo iptables -t nat -I FLANNEL-POSTRTG 6 -s 0.0.0.0/0 -d 0.0.0.0/0 -j RETURN -m comment --comment "flannel
d masq"
sudo iptables -t nat -D FLANNEL-POSTRTG 1

Then I somehow wind up with this:

Chain FLANNEL-POSTRTG (1 references)
num  target     prot opt source               destination
1    RETURN     all  --  0.0.0.0/0            0.0.0.0/0            /* flanneld masq */
2    RETURN     all  --  0.0.0.0/0            0.0.0.0/0            /* flanneld masq */
3    RETURN     all  --  10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
4    MASQUERADE  all  --  10.244.0.0/16       !224.0.0.0/4          /* flanneld masq */ random-fully
5    RETURN     all  -- !10.244.0.0/16        10.244.0.0/24        /* flanneld masq */
6    MASQUERADE  all  -- !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully

FWIW, my /etc/cni/net.d/10-flannel.conflist looks idential to @rkonfj's.

Any thoughts would be greatly appreciated. :-)

@rbrtbnfgl
Copy link
Contributor

It shouldn't be the same issue. This issue was related to the UDP checksum that should be solved on v0.20.2.
The FLANNEL-POSTRTG chain should match only the packets from the pods to the internet.
Could you create a new issue with the step that you have done so I can try to reproduce it?

@JamesLavin
Copy link

Thank you for taking time to reply, @rbrtbnfgl. (And thank you for your work on Flannel!)

At work now but will document my set-up as best I can tonight in a new issue.

I suspected the first rule in my FLANNEL-POSTRTG chain didn't belong there, which is what led me to this Github issue. I don't see such a rule in the chains posted in this thread, which seems to support my hunch and your feeling that I'm hitting something else. But the rule seems to re-appear every time I delete it, so I can't even test my hunch that the rule is messing me up:

Chain FLANNEL-POSTRTG (1 references)
target     prot opt source               destination
RETURN     all  --  anywhere             anywhere             /* flanneld masq */

@rbrtbnfgl
Copy link
Contributor

You can increase the verbosity of iptables output if you use -vL.

@JamesLavin
Copy link

Thanks! That's cool, @rbrtbnfgl. I used -vL right before and right after running kubectl create -f my-cnpg-cluster.yaml, then diff-ed the output, which should show which rules my packets are hitting:

3c3
< 1     208K   28M KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
---
> 1     209K   29M KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
14c14
< 1      413 25422 FLANNEL-POSTRTG  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* flanneld masq */
---
> 1      672 41260 FLANNEL-POSTRTG  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* flanneld masq */
20c20
< 2       87  5220 RETURN     all  --  *      *       10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
---
> 2      141  8460 RETURN     all  --  *      *       10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
23c23
< 5        4   240 MASQUERADE  all  --  *      *      !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully
---
> 5        7   420 MASQUERADE  all  --  *      *      !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully
41c41
< 1     2902  181K RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000/0x4000
---
> 1     3158  196K RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000/0x4000
197c197
< 23    3298  277K KUBE-NODEPORTS  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
---
> 23    3594  302K KUBE-NODEPORTS  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

The FLANNEL-POSTRTG diff seems to show that ALL packets are matching that first FLANNEL-POSTRTG rule that I suspected was matching all packets, so they're not getting masqueraded. This data is consistent with my theory. Wish I could figure out how to delete that rule without it getting recreated. (Again, I'll document my setup in a new issue tonight when I'm not doing my day job.)

@rbrtbnfgl
Copy link
Contributor

This issue should be fixed with Flannel v0.20.2.
I'll close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants