fix: fail to access NodePort when pod owning multiple network cards #3686

cyclinder · 2024-07-02T07:47:22Z

Add a from policy route for pod's eth0, which make sure that packets received from eth0 are forwarded out of eth0. fix to the problem of inconsistent routes.

here is the case: a pod owns multiple cards with eth0(calico 172.16.1.0/24) and eth1( 192.168.1.0/24 macvlan), and a node with an IP 192.168.1.10 visits the the pod's calico ip for the nodePort reason or health check. Finally, the reply packet does not go through eth0

Thanks for contributing!

What type of PR is this?

release/bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #3683
Special notes for your reviewer:

Pod routing table

root@demo1-7fbdd6cc66-bwgtn:/# ip r
default via 169.254.1.1 dev eth0
10.7.0.0/16 dev net1 proto kernel scope link src 10.7.168.202
10.7.168.71 dev eth0 scope link src 10.233.74.111
10.233.0.0/18 via 10.7.168.71 dev eth0 src 10.233.74.111
10.233.64.0/18 via 10.7.168.71 dev eth0 src 10.233.74.111
10.233.74.64 dev eth0 scope link src 10.233.74.111
169.254.0.0/16 via 10.7.168.71 dev eth0 src 10.233.74.111
169.254.1.1 dev eth0 scope link
172.224.168.71 dev eth0 scope link src 10.233.74.111
root@demo1-7fbdd6cc66-bwgtn:/# ip rule
0:	from all lookup local
32764:	from 10.233.74.111 lookup 500.    # ===>> new rule, the 10.233.74.111 is the ip of pod's eth0
32765:	from 10.7.168.202 lookup 100
32766:	from all lookup main
32767:	from all lookup default
root@demo1-7fbdd6cc66-bwgtn:/# ip r show table 500
default via 169.254.1.1 dev eth0 # ===>> new added

codecov · 2024-07-02T07:53:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.16%. Comparing base (0cdac43) to head (4b166d9).
Report is 4 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3686   +/-   ##
=======================================
  Coverage   81.16%   81.16%           
=======================================
  Files          50       50           
  Lines        4391     4391           
=======================================
  Hits         3564     3564           
  Misses        670      670           
  Partials      157      157

Flag	Coverage Δ
unittests	`81.16% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

weizhoublue · 2024-07-02T09:27:08Z

cmd/coordinator/cmd/utils.go

+				// copy to table 500
+				for idx := range defaultInterfaceAddress {
+					ipNet := networking.ConvertMaxMaskIPNet(defaultInterfaceAddress[idx].IP)
+					err = networking.AddFromRuleTable(ipNet, c.hostRuleTable)


这里引用一个名为 hostRuleTable 的变量，是一个歧义，这里和host应该是没有关系的？应该规范变量

weizhoublue · 2024-07-02T09:28:04Z

PR 描述中，应该给出详细的路由变更例子，以方便 PR review 和维护

weizhoublue · 2024-07-02T09:28:51Z

cmd/coordinator/cmd/utils.go

+				// see https://github.com/spidernet-io/spiderpool/issues/3683
+				copyOverlayDefaultRoute = true
+
+				// copy to table 500


为什么是 500 ，原来那些策略路由的表号是如何定义和规范的

为什么不沿用原来 mustGetRuleNumber 从 100 号开始的路由表规范

mustGetRuleNumber 只是获取当前网卡应该在哪个 table，无法获取之前网卡(eth0) 应该在哪个 table, hostRuleTable = 500 只是借来用一下

weizhoublue · 2024-07-02T09:35:09Z

这个代码变更比较多，需要详细评审下用例的覆盖情况，避免按下葫芦浮起瓢

weizhoublue · 2024-07-02T09:37:28Z

pkg/networking/networking/route.go

@@ -151,7 +151,7 @@ func AddRoute(logger *zap.Logger, ruleTable, ipFamily int, scope netlink.Scope,

 // MoveRouteTable move all routes of the specified interface to a new route table
 // Equivalent: `ip route del <route>` and `ip r route add <route> <table>`
-func MoveRouteTable(logger *zap.Logger, iface string, srcRuleTable, dstRuleTable, ipfamily int) error {
+func MoveRouteTable(logger *zap.Logger, iface string, srcRuleTable, dstRuleTable, hostRuleTable, ipfamily int, copyOverlayDefaultRoute bool) error {


这些入参的命名，极大降低了这个函数的可读性
这一大块的函数有点很杂的味道，可以考虑进行合理的 api 设计

想在两个地方复用一个函数，原函数参数本来就够多了

updated, 目前拆分为两个函数

cyclinder · 2024-07-02T10:31:29Z

看起来代码变动多，但其实只加了一个东西。经过评估，目前e2e 已经覆盖大部分case，之前没测出问题，是因为环境配置问题。再加一个指定默认网卡后的联通性case 就可以了

weizhoublue · 2024-07-08T02:32:48Z

pr 的 title 最好主要体现修复了什么现象或问题，不能说是做了什么技术修改。否则未来在 release note 中没法 review，使用者也没法知道这个pr 对他有什么帮助

weizhoublue · 2024-07-08T02:39:13Z

have it checked when the eth0 is from macvlan

weizhoublue · 2024-07-08T03:11:06Z

does it need any document modification ?

cyclinder · 2024-07-08T05:45:51Z

have it checked when the eth0 is from macvlan

these changes doesn't related to the case that when the eth0 is from macvlan, only the eth0 is from calico.

weizhoublue · 2024-07-08T06:57:15Z

have it checked when the eth0 is from macvlan

these changes doesn't related to the case that when the eth0 is from macvlan, only the eth0 is from calico.

does it really not correlate with mutiple macvlan interfaces ? maybe I do not think that way

Add a from policy route for pod's eth0, which make sure that packets received from eth0 are forwarded out of eth0. fix to the problem of inconsistent routes. Signed-off-by: cyclinder <qifeng.guo@daocloud.io>

cyclinder · 2024-07-08T07:07:50Z

Yes, In the case of multiple macvlan-nics accessing the nodeport, packets will come in from veth0, and then we have some iptables rules and policy-based routing to ensure that reply packets are still sent from veth0. So it's not relevant to the content of this pr

weizhoublue · 2024-07-09T05:22:40Z

Yes, In the case of multiple macvlan-nics accessing the nodeport, packets will come in from veth0, and then we have some iptables rules and policy-based routing to ensure that reply packets are still sent from veth0. So it's not relevant to the content of this pr

actually, I mean, there is a pod owns eth0 172.16.2.11 and eth1 172.16.3 .11. when a remote host 172.16.3.10 visits the address 172.16.2.11 of the Pod, the request packet ingress in eth0 , and the reply packet egress in eth1. May that happen

cyclinder · 2024-07-10T06:53:45Z

If we need to forward through the host stack then we need to make sure that the forwarding path is consistent, otherwise for multiple macvlan underlay interfaces this forwarding is works well, I did some testing and this is fine.

➜  ~ ip netns exec net1 ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
1999: macvlan0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ce:3a:92:4f:d0:1b brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.6.212.188/16 scope global macvlan0
       valid_lft forever preferred_lft forever
    inet6 fe80::cc3a:92ff:fe4f:d01b/64 scope link
       valid_lft forever preferred_lft forever
➜  ~ ip netns exec net1 ping 10.7.212.207 -c 2
PING 10.7.212.207 (10.7.212.207) 56(84) bytes of data.
64 bytes from 10.7.212.207: icmp_seq=1 ttl=64 time=0.465 ms
64 bytes from 10.7.212.207: icmp_seq=2 ttl=64 time=0.342 ms

--- 10.7.212.207 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.342/0.403/0.465/0.061 ms

weizhoublue · 2024-07-11T12:58:10Z

10.7.212.207

I does not mean the local node of pod, I mean a remote node.
So, when a pod owns 2 macvlan interface, is there a policy rule for the source IP of eth1 ? like what does in calico case

cyclinder · 2024-07-11T14:43:27Z

yes, I know this case, there are a remote client(10.6.212.188) and a macvlan pod with two macvlan interface(eth0: 10.7.212.201, eth1: 10.6.212.228), the client access to eth0(10.7.212.201), and the response is sent fron eth1(10.6.212.228), it works well.

cyclinder added release/bug cherrypick-release-v0.8 Cherry-pick the PR to branch release-v0.8. cherrypick-release-v0.9 labels Jul 2, 2024

cyclinder requested a review from weizhoublue as a code owner July 2, 2024 07:47

weizhoublue reviewed Jul 2, 2024

View reviewed changes

cyclinder force-pushed the coordinator/overlay_policy_routing branch from 1295da7 to 95ee34d Compare July 2, 2024 11:33

cyclinder requested review from ty-dc and bzsuni as code owners July 2, 2024 11:33

cyclinder force-pushed the coordinator/overlay_policy_routing branch 3 times, most recently from e2d87c7 to 9b7b289 Compare July 4, 2024 08:50

weizhoublue changed the title ~~coordinator: add a from policy route for pod's eth0~~ fix: fail to access NodePort when pod owning multiple network cards Jul 8, 2024

coordinator: add a from policy route for pod's eth0

4b166d9

Add a from policy route for pod's eth0, which make sure that packets received from eth0 are forwarded out of eth0. fix to the problem of inconsistent routes. Signed-off-by: cyclinder <qifeng.guo@daocloud.io>

cyclinder force-pushed the coordinator/overlay_policy_routing branch from 9b7b289 to 4b166d9 Compare July 8, 2024 07:08

cyclinder requested a review from windsonsea as a code owner July 8, 2024 07:08

weizhoublue approved these changes Jul 17, 2024

View reviewed changes

weizhoublue merged commit 93f2665 into spidernet-io:main Jul 17, 2024
54 checks passed

This was referenced Jul 17, 2024

failed to cherry pick PR 3686 from cyclinder, to branch release-v0.8 #3724

Closed

failed to cherry pick PR 3686 from cyclinder, to branch release-v0.9 #3725

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fail to access NodePort when pod owning multiple network cards #3686

fix: fail to access NodePort when pod owning multiple network cards #3686

cyclinder commented Jul 2, 2024 •

edited by weizhoublue

Loading

codecov bot commented Jul 2, 2024 •

edited

Loading

weizhoublue Jul 2, 2024 •

edited

Loading

weizhoublue commented Jul 2, 2024

weizhoublue Jul 2, 2024

weizhoublue Jul 2, 2024

cyclinder Jul 2, 2024

weizhoublue commented Jul 2, 2024

weizhoublue Jul 2, 2024 •

edited

Loading

cyclinder Jul 2, 2024

cyclinder Jul 4, 2024

cyclinder commented Jul 2, 2024

weizhoublue commented Jul 8, 2024

weizhoublue commented Jul 8, 2024

weizhoublue commented Jul 8, 2024

cyclinder commented Jul 8, 2024

weizhoublue commented Jul 8, 2024

cyclinder commented Jul 8, 2024

weizhoublue commented Jul 9, 2024 •

edited

Loading

cyclinder commented Jul 10, 2024

weizhoublue commented Jul 11, 2024 •

edited

Loading

cyclinder commented Jul 11, 2024

fix: fail to access NodePort when pod owning multiple network cards #3686

fix: fail to access NodePort when pod owning multiple network cards #3686

Conversation

cyclinder commented Jul 2, 2024 • edited by weizhoublue Loading

Thanks for contributing!

What type of PR is this?

codecov bot commented Jul 2, 2024 • edited Loading

Codecov Report

weizhoublue Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

weizhoublue commented Jul 2, 2024

weizhoublue Jul 2, 2024

Choose a reason for hiding this comment

weizhoublue Jul 2, 2024

Choose a reason for hiding this comment

cyclinder Jul 2, 2024

Choose a reason for hiding this comment

weizhoublue commented Jul 2, 2024

weizhoublue Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

cyclinder Jul 2, 2024

Choose a reason for hiding this comment

cyclinder Jul 4, 2024

Choose a reason for hiding this comment

cyclinder commented Jul 2, 2024

weizhoublue commented Jul 8, 2024

weizhoublue commented Jul 8, 2024

weizhoublue commented Jul 8, 2024

cyclinder commented Jul 8, 2024

weizhoublue commented Jul 8, 2024

cyclinder commented Jul 8, 2024

weizhoublue commented Jul 9, 2024 • edited Loading

cyclinder commented Jul 10, 2024

weizhoublue commented Jul 11, 2024 • edited Loading

cyclinder commented Jul 11, 2024

cyclinder commented Jul 2, 2024 •

edited by weizhoublue

Loading

codecov bot commented Jul 2, 2024 •

edited

Loading

weizhoublue Jul 2, 2024 •

edited

Loading

weizhoublue Jul 2, 2024 •

edited

Loading

weizhoublue commented Jul 9, 2024 •

edited

Loading

weizhoublue commented Jul 11, 2024 •

edited

Loading