Skip to content

Commit

Permalink
Merge pull request #2441 from cyclinder/docs/without_kube_proxy
Browse files Browse the repository at this point in the history
docs: Accelerate access to service for underlay CNI
  • Loading branch information
cyclinder committed Nov 7, 2023
2 parents 38ff19b + 78636ea commit 5f90186
Show file tree
Hide file tree
Showing 13 changed files with 841 additions and 630 deletions.
111 changes: 111 additions & 0 deletions docs/concepts/coordinator-zh_CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Coordinator

[**English**](coordinator.md) | **简体中文**

Spiderpool 内置一个叫 `coordinator` 的 CNI meta-plugin, 它在 Main CNI 被调用之后再工作,它主要提供以下几个主要功能:

- 解决 underlay Pod 无法访问 ClusterIP 的问题
- 在 Pod 多网卡时,调谐 Pod 的路由,确保数据包来回路径一致
- 支持检测 Pod 的 IP 是否冲突
- 支持检测 Pod 的网关是否可达
- 支持固定 Pod 的 Mac 地址前缀

下面我们将详细的介绍 `coordinator` 如何解决或实现这些功能。

> 如果您通过 `SpinderMultusConfig CR` 帮助创建 NetworkAttachmentDefinition CR,您可以在 `SpinderMultusConfig` 中配置 `coordinator` (所有字段)。参考: [SpinderMultusConfig](../reference/crd-spidermultusconfig.md)
>
> `Spidercoordinators CR` 作为 `coordinator` 插件的全局缺省配置(所有字段),其优先级低于 NetworkAttachmentDefinition CR 中的配置。 如果在 NetworkAttachmentDefinition CR 未配置, 将使用 `Spidercoordinator CR` 作为缺省值。更多详情参考: [Spidercoordinator](../reference/crd-spidercoordinator.md)
## 解决 underlay Pod 无法访问 ClusterIP 的问题

我们在使用一些如 Macvlan、IPvlan、SR-IOV 等 Underlay CNI时,会遇到其 Pod 无法访问 ClusterIP 的问题,这常常是因为 underlay Pod 访问 CLusterIP 需要经过在交换机的网关,但网关上并没有去往
ClusterIP 的路由,导致无法访问。

关于 Underlay Pod 无法访问 ClusterIP 的问题,请参考 [Underlay-CNI访问 Service](../usage/underlay_cni_service-zh_CN.md)

## 支持检测 Pod 的 IP 是否冲突( alpha 阶段)

对于 Underlay 网络,IP 冲突是无法接受的,这可能会造成严重的问题。在创建 Pod 时,我们可借助 `coordinator` 检测 Pod 的 IP 是否冲突,支持同时检测 IPv4 和 IPv6 地址。通过发送 ARP 或 NDP 探测报文,
如果发现回复报文的 Mac 地址不是 Pod 本身,那我们认为这个 IP 是冲突的,并拒绝 IP 冲突的 Pod 被创建:

我们可以通过 Spidermultusconfig 配置它:

```yaml
apiVersion: spiderpool.spidernet.io/v2beta1
kind: SpiderMultusConfig
metadata:
name: detect-ip
namespace: default
spec:
cniType: macvlan
macvlan:
master: ["eth0"]
coordinator:
detectIPConflict: true # Enable detectIPConflict
```
## 支持检测 Pod 的网关是否可达(alpha)
在 Underlay 网络下,Pod 访问外部需要通过网关转发。如果网关不可达,那么在外界看来,这个 Pod 实际是失联的。有时候我们希望创建 Pod 时,其网关是可达的。 我们可借助 `coordinator` 检测 Pod 的网关是否可达,
支持检测 IPv4 和 IPv6 的网关地址。我们通过发送 ICMP 报文,探测网关地址是否可达。如果网关不可达,将会阻止 Pod 创建:

我们可以通过 Spidermultusconfig 配置它:

```yaml
apiVersion: spiderpool.spidernet.io/v2beta1
kind: SpiderMultusConfig
metadata:
name: detect-gateway
namespace: default
spec:
cniType: macvlan
macvlan:
master: ["eth0"]
enableCoordinator: true
coordinator:
detectGateway: true # Enable detectGateway
```

> 注意: 有一些交换机不允许被 arp 探测,否则会发出告警,在这种情况下,我们需要设置 detectGateway 为 false

## 支持固定 Pod 的 Mac 地址前缀(alpha)

有一些传统应用可能需要通过固定的 Mac 地址或者 IP 地址来耦合应用的行为。比如 License Server 可能需要应用固定的 Mac 地址或 IP 地址为应用颁发 License。如果 Pod 的 Mac 地址发生改变,已颁发的 License 可能无效。
所以需要固定 Pod 的 Mac 地址。 Spiderpool 可通过 `coordinator` 固定应用的 Mac 地址,固定的规则是配置 Mac 地址前缀(2字节) + 转化 Pod 的 IP(4字节) 组成。

注意:

> 目前支持修改 Macvlan 和 SR-IOV 作为 CNI 的 Pod。 IPVlan L2 模式下主接口与子接口 Mac 地址一致,不支持修改
>
> 固定的规则是配置 Mac 地址前缀(2字节) + 转化 Pod 的 IP(4字节) 组成。一个 IPv4 地址长度 4 字节,可以完全转换为2 个 16 进制数。对于 IPv6 地址,只取最后 4 个字节。

我们可以通过 Spidermultusconfig 配置它:

```yaml
apiVersion: spiderpool.spidernet.io/v2beta1
kind: SpiderMultusConfig
metadata:
name: overwrite-mac
namespace: default
spec:
cniType: macvlan
macvlan:
master: ["eth0"]
enableCoordinator: true
coordinator:
podMACPrefix: "0a:1b" # Enable detectGateway
```

当 Pod 创建完成,我们可以检测 Pod 的 Mac 地址的前缀是否是 "0a:1b"

## 已知问题

- underlay 模式下,underlay Pod 与 Overlay Pod(calico or cilium) 进行 TCP 通信失败

此问题是因为数据包来回路径不一致导致,发出的请求报文匹配源Pod 侧的路由,会通过 `veth0` 转发到主机侧,再由主机侧转发至目标 Pod。 目标 Pod 看见数据包的源 IP 为 源 Pod 的 Underlay IP,直接走 Underlay 网络而不会经过源 Pod 所在主机。
在该主机看来这是一个非法的数据包(意外的收到 TCP 的第二次握手报文,认为是 conntrack table invalid), 所以被 kube-proxy 的一条 iptables 规则显式的 drop 。 目前可以通过切换 kube-proxy 的模式为 ipvs 规避。这个问题预计在 k8s 1.29 修复。
当 sysctl `nf_conntrack_tcp_be_liberal` 设置为 1 时,kube-proxy 将不会下发这条 DROP 规则。

- overlay 模式下, 当 Pod 附加多张网卡时。如果集群的缺省CNI 为 Cilium, Pod 的 underlay 网卡 无法与节点通信。

我们借助缺省CNI创建 Veth 设备,实现 Pod 的 underlay IP 与节点通信(正常情况下,macvlan 在 bridge 模式下, 其父子接口无法直接),但 Cilium 不允许非 Cilium 子网的 IP 从 Veth 设备转发。
110 changes: 110 additions & 0 deletions docs/concepts/coordinator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Coordinator

**English** | [**简体中文**](coordinator-zh_CN.md)

Spiderpool incorporates a CNI meta-plugin called `coordinator` that works after the Main CNI is invoked. It mainly offers the following features:

- Resolve the problem of underlay Pods unable to access ClusterIP
- Coordinate the routing for Pods with multiple NICs, ensuring consistent packet paths
- Detect IP conflicts within Pods
- Check the reachability of Pod gateways
- Support fixed Mac address prefixes for Pods

Let's delve into how coordinator implements these features.

> You can configure `coordinator` by specifying all the relevant fields in `SpinderMultusConfig` if a NetworkAttachmentDefinition CR is created via `SpinderMultusConfig CR`. For more information, please refer to [SpinderMultusConfig](../reference/crd-spidermultusconfig.md).
>
> `Spidercoordinators CR` serves as the global default configuration (all fields) for `coordinator`. However, this configuration has a lower priority compared to the settings in the NetworkAttachmentDefinition CR. In cases where no configuration is provided in the NetworkAttachmentDefinition CR, the values from `Spidercoordinators CR` serve as the defaults. For detailed information, please refer to [Spidercoordinator](../reference/crd-spidercoordinator.md).
## Resolve the problem of underlay Pods unable to access ClusterIP(beta)

When using underlay CNIs like Macvlan, IPvlan, SR-IOV, and others, a common challenge arises where underlay pods are unable to access ClusterIP. This occurs because accessing ClusterIP from underlay pods requires routing through the gateway on the switch. However, in many instances, the gateway is not configured with the proper routes to reach the ClusterIP, leading to restricted access.

For more information about the Underlay Pod not being able to access the ClusterIP, please refer to [Underlay CNI Access Service](../usage/underlay_cni_service.md)

## Detect Pod IP conflicts(alpha)

IP conflicts are unacceptable for underlay networks, which can cause serious problems. When creating a pod, we can use the `coordinator` to detect whether the IP of the pod conflicts, and support both IPv4 and IPv6 addresses. By sending an ARP or NDP probe message,
If the MAC address of the reply packet is not the pod itself, we consider the IP to be conflicting and reject the creation of the pod with conflicting IP addresses:

```yaml
apiVersion: spiderpool.spidernet.io/v2beta1
kind: SpiderMultusConfig
metadata:
name: detect-ip
namespace: default
spec:
cniType: macvlan
macvlan:
master: ["eth0"]
coordinator:
detectIPConflict: true # Enable detectIPConflict
```
## Detect Pod gateway reachability(alpha)
Under the underlay network, pod access to the outside needs to be forwarded through the gateway. If the gateway is unreachable, then the pod is actually lost. Sometimes we want to create a pod with a gateway reachable. We can use the 'coordinator' to check if the pod's gateway is reachable.
Gateway addresses for IPv4 and IPv6 can be detected. We send an ICMP packet to check whether the gateway address is reachable. If the gateway is unreachable, pods will be prevented from creating:
We can configure it via Spidermultusconfig:
```yaml
apiVersion: spiderpool.spidernet.io/v2beta1
kind: SpiderMultusConfig
metadata:
name: detect-gateway
namespace: default
spec:
cniType: macvlan
macvlan:
master: ["eth0"]
enableCoordinator: true
coordinator:
detectGateway: true # Enable detectGateway
```
> Note: There are some switches that are not allowed to be probed by arp, otherwise an alarm will be issued, in this case, we need to set detectGateway to false
## Fix MAC address prefix for Pods(alpha)
Some traditional applications may require a fixed MAC address or IP address to couple the behavior of the application. For example, the License Server may need to apply a fixed Mac address
or IP address to issue a license for the app. If the MAC address of a pod changes, the issued license may be invalid. Therefore, you need to fix the MAC address of the pod. Spiderpool can fix
the MAC address of the application through `coordinator`, and the fixed rule is to configure the MAC address prefix (2 bytes) + convert the IP of the pod (4 bytes).

Note:

> currently supports updating Macvlan and SR-IOV as pods for CNI. In IPVlan L2 mode, the MAC addresses of the primary interface and the sub-interface are the same and cannot be modified.
>
> The fixed rule is to configure the MAC address prefix (2 bytes) + the IP of the converted pod (4 bytes). An IPv4 address is 4 bytes long and can be fully converted to 2 hexadecimal numbers. For IPv6 addresses, only the last 4 bytes are taken.

We can configure it via Spidermultusconfig:

```yaml
apiVersion: spiderpool.spidernet.io/v2beta1
kind: SpiderMultusConfig
metadata:
name: overwrite-mac
namespace: default
spec:
cniType: macvlan
macvlan:
master: ["eth0"]
enableCoordinator: true
coordinator:
podMACPrefix: "0a:1b" # Enable detectGateway
```

You can check if the MAC address prefix of the Pod starts with "0a:1b" after a Pod is created.

## Known issues

- Underlay mode: TCP communication between underlay Pods and overlay Pods (Calico or Cilium) fails

This issue arises from inconsistent packet routing paths. Request packets are matched with the routing on the source Pod side and forwarded through veth0 to the host side. And then the packets are further forwarded to the target Pod. The target Pod perceives the source IP of the packet as the underlay IP of the source Pod, allowing it to bypass the source Pod's host and directly route through the underlay network.
However, on the host, this is considered an invalid packet (as it receives unexpected TCP SYN-ACK packets that are conntrack table invalid), explicitly dropping it using an iptables rule in kube-proxy. Switching the kube-proxy mode to ipvs can address this issue. This issue is expected to be fixed in K8s 1.29.
if the sysctl `nf_conntrack_tcp_be_liberal` is set to 1, kube-proxy will not deliver the DROP rule.

- Overlay mode: with Cilium as the default CNI and multiple NICs for the Pod, the underlay interface of the Pod cannot communicate with the node.

Macvlan interfaces do not allow direct communication between parent and child interfaces in bridge mode in most cases. To facilitate communication between the underlay IP of the Pod and the node, we rely on the default CNI to create Veth devices. However, Cilium restricts the forwarding of IPs from non-Cilium subnets through these Veth devices.
Binary file added docs/images/spiderpool_service_kube_proxy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/withou_kube_proxy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,16 +74,17 @@ nav:
- Egress Policy: usage/egress.md
- Route Support: usage/route.md
- Service Support: usage/service.md
- Plugin coordinator: usage/coordinator.md
- Plugin ifacer: usage/ifacer.md
- Node-based Topology: usage/network-topology.md
- RDMA: usage/rdma.md
- Access Service for Underlay CNI: usage/underlay_cni_service.md
- Kubevirt: usage/kubevirt.md
- FAQ: usage/debug.md
- Concepts:
- Architecture: concepts/arch.md
- IPAM: concepts/ipam.md
- IPAM Performance: concepts/ipam-performance.md
- Plugin coordinator: concepts/coordinator.md
- I/O Performance: concepts/io-performance.md
- Blogs: concepts/blog.md
- Reference:
Expand Down
Loading

0 comments on commit 5f90186

Please sign in to comment.