Skip to content

Commit

Permalink
Merge pull request #3902 from spidernet-io/robot/cherrypick/pr3778/re…
Browse files Browse the repository at this point in the history
…lease-v1.0

fix: Spiderpool GC incorrect IP address during statefulset Pod scale up/down, causing IP conflict
  • Loading branch information
weizhoublue committed Aug 22, 2024
2 parents c1ec2e0 + 0fe1c88 commit 099ca3d
Show file tree
Hide file tree
Showing 22 changed files with 1,500 additions and 169 deletions.
4 changes: 4 additions & 0 deletions docs/usage/install/upgrade-zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,10 @@ kubectl patch sp ${auto-pool} --type merge --patch '{"metadata": {"labels": {"ip
由于在 0.9.0 的版本中,我们给 [SpiderCoordinator CRD](./../../reference/crd-spidercoordinator.md) 补充了 `txQueueLen` 字段,但由于执行升级时 Helm 不支持升级或删除 CRD,因此在升级前需要你手动更新一下 CRD。(建议越过 0.9.0 直接升级至 0.9.1 版本)
### 低于 0.9.4 (包含 0.9.4) 升级到最高版本的注意事项
在 0.9.4 以下的版本中,statefulSet 应用在快速扩缩容场景下,Spiderpool GC 可能会错误的回收掉 IPPool 中的 IP 地址,导致同一个 IP 被分配给 K8S 集群的多个 Pod,从而出现 IP 地址冲突。该问题已修复,参考[修复](https://github.com/spidernet-io/spiderpool/pull/3778),但在升级后,冲突的 IP 地址并不能自动被 Spiderpool 纠正回来,您需要通过手动重启冲突 IP 的 Pod 来辅助解决,在新版本中不会再出现错误 GC IP 而导致 IP 冲突的问题。
### 更多版本升级的注意事项
*TODO.*
Expand Down
4 changes: 4 additions & 0 deletions docs/usage/install/upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,10 @@ In versions below 0.7.3, Spiderpool will enable a set of DaemonSet: `spiderpool-
Due to the addition of the `txQueueLen` field to the [SpiderCoordinator CRD](./../../reference/crd-spidercoordinator.md) in version 0.9.0, you need to manually update the CRD before upgrading as Helm does not support upgrading or deleting CRDs during the upgrade process.(We suggest skipping version 0.9.0 and upgrading directly to version 0.9.1)
### Upgrading from a version below 0.9.4 (Includes 0.9.4) to a higher version
In versions below 0.9.4, when statefulSet is rapidly scaling up or down, Spiderpool GC may mistakenly reclaim IP addresses in IPPool, causing the same IP to be assigned to multiple Pods in the K8S cluster, resulting in IP address conflicts. This issue has been fixed, see [Fix](https://github.com/spidernet-io/spiderpool/pull/3778), but after the upgrade, the conflicting IP addresses cannot be automatically corrected by Spiderpool. You need to manually restart the Pod with the conflicting IP to assist in resolving the issue. In the new version, there will no longer be an issue with IP conflicts caused by incorrect GC IPs.
### More notes on version upgrades
*TODO.*
Expand Down
4 changes: 2 additions & 2 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ require (
github.com/sasha-s/go-deadlock v0.3.1
github.com/spf13/cobra v1.8.0
github.com/spf13/pflag v1.0.5
github.com/spidernet-io/e2eframework v0.0.0-20240130031916-71bf7b1ddd00
github.com/spidernet-io/e2eframework v0.0.0-20240816061218-9ba7f53b8c73
github.com/tigera/operator v1.33.0
github.com/vishvananda/netlink v1.2.1-beta.2.0.20230621221334-77712cff8739
go.opentelemetry.io/otel v1.25.0
Expand Down Expand Up @@ -77,6 +77,7 @@ require k8s.io/component-base v0.29.4 // indirect

require (
github.com/hashicorp/go-multierror v1.1.1
k8s.io/kubectl v0.26.3
k8s.io/kubelet v0.29.2
tags.cncf.io/container-device-interface v0.6.2
tags.cncf.io/container-device-interface/specs-go v0.6.0
Expand Down Expand Up @@ -193,7 +194,6 @@ require (
gopkg.in/ini.v1 v1.67.0 // indirect
k8s.io/gengo/v2 v2.0.0-20240228010128-51d4e06bde70 // indirect
k8s.io/kube-openapi v0.0.0-20240228011516-70dd3763d340 // indirect
k8s.io/kubectl v0.26.3 // indirect
kubevirt.io/containerized-data-importer-api v1.57.0-alpha1 // indirect
kubevirt.io/controller-lifecycle-operator-sdk/api v0.0.0-20220329064328-f3cc58c6ed90 // indirect
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -522,8 +522,8 @@ github.com/spf13/pflag v1.0.5 h1:iy+VFUOCP1a+8yFto/drg2CJ5u0yRoB7fZw3DKv/JXA=
github.com/spf13/pflag v1.0.5/go.mod h1:McXfInJRrz4CZXVZOBLb0bTZqETkiAhM9Iw0y3An2Bg=
github.com/spf13/viper v1.16.0 h1:rGGH0XDZhdUOryiDWjmIvUSWpbNqisK8Wk0Vyefw8hc=
github.com/spf13/viper v1.16.0/go.mod h1:yg78JgCJcbrQOvV9YLXgkLaZqUidkY9K+Dd1FofRzQg=
github.com/spidernet-io/e2eframework v0.0.0-20240130031916-71bf7b1ddd00 h1:e6+I4kKloty0a6bV9y1s8lF+Xb3AX+yUdj53J9EsfJw=
github.com/spidernet-io/e2eframework v0.0.0-20240130031916-71bf7b1ddd00/go.mod h1:k0KYxyNjZYyEG1bsGzSbMx5Q+Z1H6oOjEq5qz9UlBzY=
github.com/spidernet-io/e2eframework v0.0.0-20240816061218-9ba7f53b8c73 h1:KzfBFPaiBnT6LBVhwrabJ59o/0Vsv/9CKszUgaz1TIs=
github.com/spidernet-io/e2eframework v0.0.0-20240816061218-9ba7f53b8c73/go.mod h1:k0KYxyNjZYyEG1bsGzSbMx5Q+Z1H6oOjEq5qz9UlBzY=
github.com/stoewer/go-strcase v1.2.0/go.mod h1:IBiWB2sKIp3wVVQ3Y035++gc+knqhUQag1KpM8ahLw8=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw=
Expand Down
5 changes: 5 additions & 0 deletions pkg/gcmanager/pod_cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ type PodEntry struct {
PodName string
Namespace string
NodeName string
UID string

EntryUpdateTime time.Time
TracingStartTime time.Time
Expand Down Expand Up @@ -169,10 +170,12 @@ func (s *SpiderGC) buildPodEntry(oldPod, currentPod *corev1.Pod, deleted bool) (

// deleted pod
if deleted {

podEntry := &PodEntry{
PodName: currentPod.Name,
Namespace: currentPod.Namespace,
NodeName: currentPod.Spec.NodeName,
UID: string(currentPod.UID),
EntryUpdateTime: metav1.Now().UTC(),
TracingStartTime: metav1.Now().UTC(),
TracingGracefulTime: time.Duration(s.gcConfig.AdditionalGraceDelay) * time.Second,
Expand Down Expand Up @@ -244,6 +247,7 @@ func (s *SpiderGC) buildPodEntry(oldPod, currentPod *corev1.Pod, deleted bool) (
Namespace: currentPod.Namespace,
NodeName: currentPod.Spec.NodeName,
EntryUpdateTime: metav1.Now().UTC(),
UID: string(currentPod.UID),
TracingStartTime: currentPod.DeletionTimestamp.Time,
PodTracingReason: podStatus,
}
Expand All @@ -263,6 +267,7 @@ func (s *SpiderGC) buildPodEntry(oldPod, currentPod *corev1.Pod, deleted bool) (
PodName: currentPod.Name,
Namespace: currentPod.Namespace,
NodeName: currentPod.Spec.NodeName,
UID: string(currentPod.UID),
EntryUpdateTime: metav1.Now().UTC(),
PodTracingReason: podStatus,
}
Expand Down
Loading

0 comments on commit 099ca3d

Please sign in to comment.