Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fail to access NodePort when pod owning multiple network cards #3686

Merged

Conversation

cyclinder
Copy link
Collaborator

@cyclinder cyclinder commented Jul 2, 2024

Add a from policy route for pod's eth0, which make sure that packets received from eth0 are forwarded out of eth0. fix to the problem of inconsistent routes.

here is the case: a pod owns multiple cards with eth0(calico 172.16.1.0/24) and eth1( 192.168.1.0/24 macvlan), and a node with an IP 192.168.1.10 visits the the pod's calico ip for the nodePort reason or health check. Finally, the reply packet does not go through eth0

Thanks for contributing!

What type of PR is this?

  • release/bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #3683
Special notes for your reviewer:

Pod routing table

root@demo1-7fbdd6cc66-bwgtn:/# ip r
default via 169.254.1.1 dev eth0
10.7.0.0/16 dev net1 proto kernel scope link src 10.7.168.202
10.7.168.71 dev eth0 scope link src 10.233.74.111
10.233.0.0/18 via 10.7.168.71 dev eth0 src 10.233.74.111
10.233.64.0/18 via 10.7.168.71 dev eth0 src 10.233.74.111
10.233.74.64 dev eth0 scope link src 10.233.74.111
169.254.0.0/16 via 10.7.168.71 dev eth0 src 10.233.74.111
169.254.1.1 dev eth0 scope link
172.224.168.71 dev eth0 scope link src 10.233.74.111
root@demo1-7fbdd6cc66-bwgtn:/# ip rule
0:	from all lookup local
32764:	from 10.233.74.111 lookup 500.    # ===>> new rule, the 10.233.74.111 is the ip of pod's eth0
32765:	from 10.7.168.202 lookup 100
32766:	from all lookup main
32767:	from all lookup default
root@demo1-7fbdd6cc66-bwgtn:/# ip r show table 500
default via 169.254.1.1 dev eth0 # ===>> new added

Copy link

codecov bot commented Jul 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.16%. Comparing base (0cdac43) to head (4b166d9).
Report is 4 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #3686   +/-   ##
=======================================
  Coverage   81.16%   81.16%           
=======================================
  Files          50       50           
  Lines        4391     4391           
=======================================
  Hits         3564     3564           
  Misses        670      670           
  Partials      157      157           
Flag Coverage Δ
unittests 81.16% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

// copy to table 500
for idx := range defaultInterfaceAddress {
ipNet := networking.ConvertMaxMaskIPNet(defaultInterfaceAddress[idx].IP)
err = networking.AddFromRuleTable(ipNet, c.hostRuleTable)
Copy link
Collaborator

@weizhoublue weizhoublue Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里 引用 一个 名为 hostRuleTable 的变量,是一个 歧义,这里和host应该是没有关系的 ? 应该规范变量

@weizhoublue
Copy link
Collaborator

PR 描述中,应该给出详细的 路由变更例子,以方便 PR review 和维护

// see https://github.com/spidernet-io/spiderpool/issues/3683
copyOverlayDefaultRoute = true

// copy to table 500
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么是 500 , 原来那些 策略路由的 表号是如何 定义和规范的

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么不沿用 原来 mustGetRuleNumber 从 100 号开始的路由表 规范

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mustGetRuleNumber 只是获取当前网卡应该在哪个 table,无法获取 之前网卡(eth0) 应该在 哪个 table, hostRuleTable = 500 只是借来用一下

@weizhoublue
Copy link
Collaborator

这个代码变更 比较多,需要详细 评审下 用例的覆盖 情况 ,避免 按下葫芦浮起瓢

@@ -151,7 +151,7 @@ func AddRoute(logger *zap.Logger, ruleTable, ipFamily int, scope netlink.Scope,

// MoveRouteTable move all routes of the specified interface to a new route table
// Equivalent: `ip route del <route>` and `ip r route add <route> <table>`
func MoveRouteTable(logger *zap.Logger, iface string, srcRuleTable, dstRuleTable, ipfamily int) error {
func MoveRouteTable(logger *zap.Logger, iface string, srcRuleTable, dstRuleTable, hostRuleTable, ipfamily int, copyOverlayDefaultRoute bool) error {
Copy link
Collaborator

@weizhoublue weizhoublue Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些 入参的命名, 极大 降低了 这个 函数的可读性
这一大块的函数 有点 很杂的味道,可以考虑 进行 合理的 api 设计

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

想在两个地方复用一个函数,原函数参数本来就够多了

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, 目前拆分为两个函数

@cyclinder
Copy link
Collaborator Author

看起来代码变动多,但其实只加了一个东西。经过评估,目前e2e 已经覆盖大部分case,之前没测出问题,是因为环境配置问题。再加一个指定默认网卡后的联通性case 就可以了

@cyclinder cyclinder force-pushed the coordinator/overlay_policy_routing branch from 1295da7 to 95ee34d Compare July 2, 2024 11:33
@cyclinder cyclinder force-pushed the coordinator/overlay_policy_routing branch 3 times, most recently from e2d87c7 to 9b7b289 Compare July 4, 2024 08:50
@weizhoublue
Copy link
Collaborator

pr 的 title 最好 主要体现 修复了什么现象或问题,不能说是 做了什么技术修改 。 否则 未来 在 release note 中 没法 review,使用者也没法知道这个pr 对他有什么帮助

@weizhoublue
Copy link
Collaborator

have it checked when the eth0 is from macvlan

@weizhoublue weizhoublue changed the title coordinator: add a from policy route for pod's eth0 fix: fail to access NodePort when pod owning multiple network cards Jul 8, 2024
@weizhoublue
Copy link
Collaborator

does it need any document modification ?

@cyclinder
Copy link
Collaborator Author

have it checked when the eth0 is from macvlan

these changes doesn't related to the case that when the eth0 is from macvlan, only the eth0 is from calico.

@weizhoublue
Copy link
Collaborator

have it checked when the eth0 is from macvlan

these changes doesn't related to the case that when the eth0 is from macvlan, only the eth0 is from calico.

does it really not correlate with mutiple macvlan interfaces ? maybe I do not think that way

Add a from policy route for pod's eth0, which make sure that packets received from eth0 are forwarded out of eth0. fix to the problem of inconsistent routes.

Signed-off-by: cyclinder <qifeng.guo@daocloud.io>
@cyclinder
Copy link
Collaborator Author

Yes, In the case of multiple macvlan-nics accessing the nodeport, packets will come in from veth0, and then we have some iptables rules and policy-based routing to ensure that reply packets are still sent from veth0. So it's not relevant to the content of this pr

@cyclinder cyclinder force-pushed the coordinator/overlay_policy_routing branch from 9b7b289 to 4b166d9 Compare July 8, 2024 07:08
@weizhoublue
Copy link
Collaborator

weizhoublue commented Jul 9, 2024

Yes, In the case of multiple macvlan-nics accessing the nodeport, packets will come in from veth0, and then we have some iptables rules and policy-based routing to ensure that reply packets are still sent from veth0. So it's not relevant to the content of this pr

actually, I mean, there is a pod owns eth0 172.16.2.11 and eth1 172.16.3 .11. when a remote host 172.16.3.10 visits the address 172.16.2.11 of the Pod, the request packet ingress in eth0 , and the reply packet egress in eth1. May that happen

@cyclinder
Copy link
Collaborator Author

If we need to forward through the host stack then we need to make sure that the forwarding path is consistent, otherwise for multiple macvlan underlay interfaces this forwarding is works well, I did some testing and this is fine.

➜  ~ ip netns exec net1 ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
1999: macvlan0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ce:3a:92:4f:d0:1b brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.6.212.188/16 scope global macvlan0
       valid_lft forever preferred_lft forever
    inet6 fe80::cc3a:92ff:fe4f:d01b/64 scope link
       valid_lft forever preferred_lft forever
➜  ~ ip netns exec net1 ping 10.7.212.207 -c 2
PING 10.7.212.207 (10.7.212.207) 56(84) bytes of data.
64 bytes from 10.7.212.207: icmp_seq=1 ttl=64 time=0.465 ms
64 bytes from 10.7.212.207: icmp_seq=2 ttl=64 time=0.342 ms

--- 10.7.212.207 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.342/0.403/0.465/0.061 ms

@weizhoublue
Copy link
Collaborator

weizhoublue commented Jul 11, 2024

10.7.212.207

I does not mean the local node of pod, I mean a remote node.
So, when a pod owns 2 macvlan interface, is there a policy rule for the source IP of eth1 ? like what does in calico case

@cyclinder
Copy link
Collaborator Author

yes, I know this case, there are a remote client(10.6.212.188) and a macvlan pod with two macvlan interface(eth0: 10.7.212.201, eth1: 10.6.212.228), the client access to eth0(10.7.212.201), and the response is sent fron eth1(10.6.212.228), it works well.

@weizhoublue weizhoublue merged commit 93f2665 into spidernet-io:main Jul 17, 2024
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

node failed to access to calico ip of the multi-nic pod if calico nic is the default-route nic
2 participants