60+ seconds stuck when call a http service pod #1679

rkonfj · 2022-11-21T15:38:44Z

when flanneld version upgrading to v0.20.1 and curl http service pod in different node via ClusterIP will stuck 60+ seconds.

Expected Behavior

no stuck

Current Behavior

stuck 60+ seconds

Possible Solution

eh... may be caused by double-NAT, i have no idea

Steps to Reproduce (for bugs)

it will stuck curl when nat POSTROUTING order like this:

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
FLANNEL-POSTRTG  all  --  anywhere             anywhere             /* flanneld masq */
KUBE-POSTROUTING  all  --  anywhere             anywhere             /* kubernetes postrouting rules */

it works fine like this:

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
KUBE-POSTROUTING  all  --  anywhere             anywhere             /* kubernetes postrouting rules */
FLANNEL-POSTRTG  all  --  anywhere             anywhere             /* flanneld masq */

Context

this pr(kubernetes/kubernetes#92035) looks like to solve this issue, but I still have this problem when I use flanneld v0.20.1

Your Environment

Flannel version: v0.20.1
Backend used (e.g. vxlan or udp): vxlan
Etcd version: 3.5.3
Kubernetes version (if used): v1.25.4
Operating System and version: Archlinux (kernel version 6.0.8)
Link to your project (optional):

The text was updated successfully, but these errors were encountered:

rbrtbnfgl · 2022-11-22T15:41:09Z

Hi @rkonfj thanks for reporting this.
I tried to setup a cluster with flannel and two pods deployed on two different nodes (one client and the other with nginx).
If I used wget on the exposed service address from the client pod the IPs are correctly translated by Kube-proxy and the connection works fine.
Could you give more info about your setup?
Can you check the output of kubectl get service -A and kubectl get service -A?

rkonfj · 2022-11-22T16:48:07Z

@rbrtbnfgl pod to pod via service ip may works fine, but node to pod via service ip will be stuck

rbrtbnfgl · 2022-11-22T17:56:26Z

I understand your issue but I still didn't get the same result.
I deployed the nginx pod on a node and I tried curl from a different node to the one where the pod was deployed and it worked.

rbrtbnfgl · 2022-11-22T18:00:02Z

Could you share the content of /etc/cni/net.d/10-flannel.conflist?

rkonfj · 2022-11-23T00:45:49Z

@rbrtbnfgl the content of this file is consistent on each node

# cat /etc/cni/net.d/10-flannel.conflist
{
  "name": "cbr0",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}

rkonfj · 2022-11-23T03:08:19Z

@rbrtbnfgl let's confirm whether there is double-NAT

the nat rules formed by `flanneld-v0.20.1` and `kube-proxy-v1.25.4` are as follows:

# iptables -t nat -L POSTROUTING # expand from `FLANNEL-POSTRTG`' and `KUBE-POSTROUTING` 
Chain POSTROUTING (policy ACCEPT)
num  target     prot opt source
1    RETURN     all  --  10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
2    MASQUERADE  all  --  10.244.0.0/16       !base-address.mcast.net/4  /* flanneld masq */ random-fully
3    RETURN     all  -- !10.244.0.0/16        k1/24                /* flanneld masq */
4    MASQUERADE  all  -- !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully

1    MASQUERADE  all  --  anywhere             anywhere             /* Kubernetes endpoints dst ip:port, source ip for solving hairpin purpose */ match-set KUBE-LOOP-BACK dst,dst,src
2    RETURN     all  --  anywhere             anywhere             mark match ! 0x4000/0x4000
3    MARK       all  --  anywhere             anywhere             MARK xor 0x4000
4    MASQUERADE  all  --  anywhere             anywhere             /* kubernetes service traffic requiring SNAT */ random-fully

service ip -> pod ip, packets will NATed by FLANNEL-POSTRTG's rule 4 and KUBE-POSTROUTING's rule 4, double-NAT right ?

how can we only NAT once? possible solution 1 (unmark packets when MASQUERADED)

Chain POSTROUTING (policy ACCEPT)
num  target     prot opt source
1    RETURN     all  --  10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
2    MASQUERADE  all  --  10.244.0.0/16       !base-address.mcast.net/4  /* flanneld masq */ random-fully
3    RETURN     all  -- !10.244.0.0/16        k1/24                /* flanneld masq */
4    MASQUERADE  all  -- !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully

1    MARK       all  --  anywhere             anywhere             MARK xor 0x4000

1    MASQUERADE  all  --  anywhere             anywhere             /* Kubernetes endpoints dst ip:port, source ip for solving hairpin purpose */ match-set KUBE-LOOP-BACK dst,dst,src
2    RETURN     all  --  anywhere             anywhere             mark match ! 0x4000/0x4000
3    MARK       all  --  anywhere             anywhere             MARK xor 0x4000
4    MASQUERADE  all  --  anywhere             anywhere             /* kubernetes service traffic requiring SNAT */ random-fully

how can we only NAT once? possible solution 2 (make `KUBE-POSTROUTING`'s rules precede `FLANNEL-POSTRTG`)

Chain POSTROUTING (policy ACCEPT)
num  target     prot opt source
1    MASQUERADE  all  --  anywhere             anywhere             /* Kubernetes endpoints dst ip:port, source ip for solving hairpin purpose */ match-set KUBE-LOOP-BACK dst,dst,src
2    RETURN     all  --  anywhere             anywhere             mark match ! 0x4000/0x4000
3    MARK       all  --  anywhere             anywhere             MARK xor 0x4000
4    MASQUERADE  all  --  anywhere             anywhere             /* kubernetes service traffic requiring SNAT */ random-fully

1    RETURN     all  --  10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
2    MASQUERADE  all  --  10.244.0.0/16       !base-address.mcast.net/4  /* flanneld masq */ random-fully
3    RETURN     all  -- !10.244.0.0/16        k1/24                /* flanneld masq */
4    MASQUERADE  all  -- !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully

rbrtbnfgl · 2022-11-23T09:35:32Z

Ok now it's clear. This bug is only happening on some kernel versions, that's why it wasn't happening on my setup. I'll try to update the iptables rules.

rbrtbnfgl · 2022-11-23T09:55:06Z

Thanks @rkonfj to find that. The fix will also speed up the MASQUERADE process.
Could you test on your setup if adding this iptables rule on the node will fix the issue?
iptables -t nat -I FLANNEL-POSTRTG -m mark --mark 0x4000/0x4000 -j RETURN

rkonfj · 2022-11-23T10:03:28Z

Thanks @rkonfj to find that. The fix will also speed up the MASQUERADE process. Could you test on your setup if adding this iptables rule on the node will fix the issue? iptables -t nat -I FLANNEL-POSTRTG -m mark --mark 0x4000/0x4000 -j RETURN

it works well

rbrtbnfgl · 2022-11-23T10:04:46Z

Thanks I'll update the code to add this additional rule.

manuelbuil · 2022-11-26T19:34:01Z

@rkonfj we believe your kernel still has a vxlan bug which makes you see this problem when double natting. We can avoid it by not double-natting as @rbrtbnfgl suggests. But just to verify, with the original flannel iptable rules and thus double-natting, could you execute in your nodes:

sudo ethtool -K flannel.1 tx-checksum-ip-generic off

And then try again. That should remove the vxlan bug from the equation and thus it should work, even if having double-natting

rkonfj · 2022-11-28T01:12:01Z

@rkonfj we believe your kernel still has a vxlan bug which makes you see this problem when double natting. We can avoid it by not double-natting as @rbrtbnfgl suggests. But just to verify, with the original flannel iptable rules and thus double-natting, could you execute in your nodes:
sudo ethtool -K flannel.1 tx-checksum-ip-generic off
And then try again. That should remove the vxlan bug from the equation and thus it should work, even if having double-natting

yes, it works

JamesLavin · 2023-01-03T04:55:12Z

Flannel version: v0.20.2
Backend used (e.g. vxlan or udp): vxlan
Etcd version: 3.3 (I believe)
Kubernetes version (if used): v1.26.0
Operating System and version: Ubuntu 22.04 (kernel version 5.15.0-56-generic)

I've been banging my head on a possibly related issue. I built a Kubernetes cluster from four Ubuntu 22.04 servers. Most stuff is working fine, but a few things that seem to involve leaving or entering the pod network are messed up. I had to run metrics-server on hostNetwork to get it to work, and I get this error trying to create a Postgres cluster:

$ kubectl create -f my-cnpg-cluster.yaml
Error from server (InternalError): error when creating "my-cnpg-cluster.yaml": Internal error occurred: failed calling webhook "mcluster.kb.io": failed to call webhook: Post "https://cnpg-webhook-service.cnpg-system.svc:443/mutate-postgresql-cnpg-io-v1-cluster?timeout=10s": context deadline exceeded

I suspected a NAT/masquerading issue and noticed that the first FLANNEL-POSTRTG rule (below) seems to override all subsequent rules (though I'm not super knowledgeable about iptables and may be misinterpreting).

Here's what it looked like before I changed anything:

$ sudo iptables -t nat -L

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
FLANNEL-POSTRTG  all  --  anywhere             anywhere             /* flanneld masq */
KUBE-POSTROUTING  all  --  anywhere             anywhere             /* kubernetes postrouting rules */

Chain FLANNEL-POSTRTG (1 references)
target     prot opt source               destination
RETURN     all  --  anywhere             anywhere             /* flanneld masq */
RETURN     all  --  10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
MASQUERADE  all  --  10.244.0.0/16       !base-address.mcast.net/4  /* flanneld masq */ random-fully
RETURN     all  -- !10.244.0.0/16        williams/24          /* flanneld masq */
MASQUERADE  all  -- !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination
RETURN     all  --  anywhere             anywhere             mark match ! 0x4000/0x4000
MARK       all  --  anywhere             anywhere             MARK xor 0x4000
MASQUERADE  all  --  anywhere             anywhere             /* kubernetes service traffic requiring SNAT */ random-fully

Annoyingly, I have twice tried adding the rule at the end and removing it from the beginning, but something (Flannel, I suppose) keeps recreating it at the top. I do this:

sudo iptables -t nat -I FLANNEL-POSTRTG 6 -s 0.0.0.0/0 -d 0.0.0.0/0 -j RETURN -m comment --comment "flannel
d masq"
sudo iptables -t nat -D FLANNEL-POSTRTG 1

Then I somehow wind up with this:

Chain FLANNEL-POSTRTG (1 references)
num  target     prot opt source               destination
1    RETURN     all  --  0.0.0.0/0            0.0.0.0/0            /* flanneld masq */
2    RETURN     all  --  0.0.0.0/0            0.0.0.0/0            /* flanneld masq */
3    RETURN     all  --  10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
4    MASQUERADE  all  --  10.244.0.0/16       !224.0.0.0/4          /* flanneld masq */ random-fully
5    RETURN     all  -- !10.244.0.0/16        10.244.0.0/24        /* flanneld masq */
6    MASQUERADE  all  -- !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully

FWIW, my /etc/cni/net.d/10-flannel.conflist looks idential to @rkonfj's.

Any thoughts would be greatly appreciated. :-)

rbrtbnfgl · 2023-01-03T10:25:08Z

It shouldn't be the same issue. This issue was related to the UDP checksum that should be solved on v0.20.2.
The FLANNEL-POSTRTG chain should match only the packets from the pods to the internet.
Could you create a new issue with the step that you have done so I can try to reproduce it?

JamesLavin · 2023-01-03T14:03:44Z

Thank you for taking time to reply, @rbrtbnfgl. (And thank you for your work on Flannel!)

At work now but will document my set-up as best I can tonight in a new issue.

I suspected the first rule in my FLANNEL-POSTRTG chain didn't belong there, which is what led me to this Github issue. I don't see such a rule in the chains posted in this thread, which seems to support my hunch and your feeling that I'm hitting something else. But the rule seems to re-appear every time I delete it, so I can't even test my hunch that the rule is messing me up:

Chain FLANNEL-POSTRTG (1 references)
target     prot opt source               destination
RETURN     all  --  anywhere             anywhere             /* flanneld masq */

rbrtbnfgl · 2023-01-03T15:34:49Z

You can increase the verbosity of iptables output if you use -vL.

JamesLavin · 2023-01-03T17:45:02Z

Thanks! That's cool, @rbrtbnfgl. I used -vL right before and right after running kubectl create -f my-cnpg-cluster.yaml, then diff-ed the output, which should show which rules my packets are hitting:

3c3
< 1     208K   28M KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
---
> 1     209K   29M KUBE-SERVICES  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */
14c14
< 1      413 25422 FLANNEL-POSTRTG  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* flanneld masq */
---
> 1      672 41260 FLANNEL-POSTRTG  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* flanneld masq */
20c20
< 2       87  5220 RETURN     all  --  *      *       10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
---
> 2      141  8460 RETURN     all  --  *      *       10.244.0.0/16        10.244.0.0/16        /* flanneld masq */
23c23
< 5        4   240 MASQUERADE  all  --  *      *      !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully
---
> 5        7   420 MASQUERADE  all  --  *      *      !10.244.0.0/16        10.244.0.0/16        /* flanneld masq */ random-fully
41c41
< 1     2902  181K RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000/0x4000
---
> 1     3158  196K RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            mark match ! 0x4000/0x4000
197c197
< 23    3298  277K KUBE-NODEPORTS  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL
---
> 23    3594  302K KUBE-NODEPORTS  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

The FLANNEL-POSTRTG diff seems to show that ALL packets are matching that first FLANNEL-POSTRTG rule that I suspected was matching all packets, so they're not getting masqueraded. This data is consistent with my theory. Wish I could figure out how to delete that rule without it getting recreated. (Again, I'll document my setup in a new issue tonight when I'm not doing my day job.)

rbrtbnfgl · 2023-01-12T11:24:11Z

This issue should be fixed with Flannel v0.20.2.
I'll close it.

rbrtbnfgl self-assigned this Nov 22, 2022

rbrtbnfgl mentioned this issue Nov 23, 2022

Fixed masquerade rule to avoid double NAT bug #1681

Merged

3 tasks

rbrtbnfgl mentioned this issue Jan 3, 2023

Super slow access to service IP from host (& host-networked pods) with Flannel CNI #1245

Closed

JamesLavin mentioned this issue Jan 4, 2023

K8s service calls timing out, possibly due to suspicious FLANNEL-POSTRTG rule #1703

Closed

rbrtbnfgl closed this as completed Jan 12, 2023

niusmallnan mentioned this issue Mar 7, 2023

[BUG] CIS scan on k3s clusters running for too long before it gets completed. rancher/rancher#39839

Closed

ejweber mentioned this issue Aug 21, 2023

[BUG] Failed Statefulset Pod Creation with RWX Workload on Longhorn v1.3.3 and SLES 15 SP5 longhorn/longhorn#6494

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

60+ seconds stuck when call a http service pod #1679

60+ seconds stuck when call a http service pod #1679

rkonfj commented Nov 21, 2022

rbrtbnfgl commented Nov 22, 2022 •

edited

Loading

rkonfj commented Nov 22, 2022

rbrtbnfgl commented Nov 22, 2022

rbrtbnfgl commented Nov 22, 2022

rkonfj commented Nov 23, 2022 •

edited

Loading

rkonfj commented Nov 23, 2022 •

edited

Loading

rbrtbnfgl commented Nov 23, 2022

rbrtbnfgl commented Nov 23, 2022

rkonfj commented Nov 23, 2022

rbrtbnfgl commented Nov 23, 2022

manuelbuil commented Nov 26, 2022

rkonfj commented Nov 28, 2022

JamesLavin commented Jan 3, 2023 •

edited

Loading

rbrtbnfgl commented Jan 3, 2023

JamesLavin commented Jan 3, 2023

rbrtbnfgl commented Jan 3, 2023

JamesLavin commented Jan 3, 2023

rbrtbnfgl commented Jan 12, 2023

60+ seconds stuck when call a http service pod #1679

60+ seconds stuck when call a http service pod #1679

Comments

rkonfj commented Nov 21, 2022

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

rbrtbnfgl commented Nov 22, 2022 • edited Loading

rkonfj commented Nov 22, 2022

rbrtbnfgl commented Nov 22, 2022

rbrtbnfgl commented Nov 22, 2022

rkonfj commented Nov 23, 2022 • edited Loading

rkonfj commented Nov 23, 2022 • edited Loading

the nat rules formed by flanneld-v0.20.1 and kube-proxy-v1.25.4 are as follows:

how can we only NAT once? possible solution 1 (unmark packets when MASQUERADED)

how can we only NAT once? possible solution 2 (make KUBE-POSTROUTING's rules precede FLANNEL-POSTRTG)

rbrtbnfgl commented Nov 23, 2022

rbrtbnfgl commented Nov 23, 2022

rkonfj commented Nov 23, 2022

rbrtbnfgl commented Nov 23, 2022

manuelbuil commented Nov 26, 2022

rkonfj commented Nov 28, 2022

JamesLavin commented Jan 3, 2023 • edited Loading

rbrtbnfgl commented Jan 3, 2023

JamesLavin commented Jan 3, 2023

rbrtbnfgl commented Jan 3, 2023

JamesLavin commented Jan 3, 2023

rbrtbnfgl commented Jan 12, 2023

rbrtbnfgl commented Nov 22, 2022 •

edited

Loading

rkonfj commented Nov 23, 2022 •

edited

Loading

rkonfj commented Nov 23, 2022 •

edited

Loading

the nat rules formed by `flanneld-v0.20.1` and `kube-proxy-v1.25.4` are as follows:

how can we only NAT once? possible solution 2 (make `KUBE-POSTROUTING`'s rules precede `FLANNEL-POSTRTG`)

JamesLavin commented Jan 3, 2023 •

edited

Loading