Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Spiderpool GC incorrect IP address during statefulset Pod scale up/down, causing IP conflict #3778

Merged
merged 1 commit into from
Aug 19, 2024

Conversation

ty-dc
Copy link
Collaborator

@ty-dc ty-dc commented Jul 26, 2024

Thanks for contributing!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #3751 #3817

Special notes for your reviewer:

现象:

  1. 当 statefulSet 应用经历快速缩容后再扩容,Pod 重新运行后有概率出现 IP 地址冲突。
  2. statefulSet 应用先使用 Spiderpool IPAM,然后不再使用它,经历过 GC All 流程后,仍存在 spiderEndpoint 和 IPPool IP 残留。

根因

现象 1

  1. 当 statefulSet 应用经历快速缩容后再扩容,handlePodEntryForTracingTimeOut 检查到 Pod 删除超时,但未判断当前应用的 status.IPs 是否为空(如果为空,实际上已经被 cmdDel 正确回收了),由于没判断,导致为空时,仍执行了回收 IP 的动作,而如果刚好有其他的 Pod 启动,并向 Spiderpool 请求,也分配到了该 IP 地址,但由于被该回收 IP 动作又把 Spiderpool 中记录的清理掉了。此时就会出现在 K8S 中记录的 Pod IP 却在 IPool 中不存在。而 Spiderpool 就会认为这个 IP 地址可以被使用,当有新的 Pod 来获取分配时,Spiderpool 又把同一 IP 分给了新 Pod,那么先后两次 Pod 获得了同一个 IP 地址,此时就出现了 IP 冲突。

  2. 当 statefulSet 应用经历快速缩容后再扩容,当出现 Pod Name 和 SpiderEndpoint Name 相同,但它们的 UID 却不同时,这样的 SpiderEndpoint 应该被回收,但没办法知道 SpiderEndpoint 旧历史 IP 信息和 IP 池名,应该保持不回收,让 GC all 处理,而在 releaseIPPoolIPExecutor 方法中,缺少对该逻辑的判断,直接使用了 SpiderEndpoint 记录的 UID 和 IPs 执行了回收操作,此时 SpiderEndpoint 记录的信息可能是重启后 Pod 新分配到的 IP 地址,但却被错误的回收了。导致 K8S 和 IPPool 记录不一致,从而出现 IP 地址冲突。

现象 2

statefulSet 不再使用 Spiderpool IPAM ,Spiderpool 不会在 tracePod_worker 去处理和回收 spiderEndpoint 和旧数据,而是在 gc All 代码中处理,但代码执行到 string(podYaml.UID) != poolIPAllocation.PodUID 时,此时可能是 sts Pod 重启了、也可能是 sts Pod 未再使用 spiderpool IPAM,但是 gc all 原本代码只判断了它是否是需要固定 IP 的应用(statefulset、kubevirt),如果是,就直接跳过,未对其进行任何回收操作,因此导致了 spiderEndpoint 和 IP 残留。

现象3:

job 类型的 pod 处于 complete 状态后, IP 被回收,但是 endpoint 未被回收。

解决:

现象 1

  1. 当有状态应用快速经历缩容又扩容,当检查 podEntry.TracingStopTime 超时后,增强逻辑判断 K8S 中的当前 Pod 的 IPs 是否为空,如果为空,说明已经通过 cmdDel 完成了正常的 IP 释放,此时不需要去回收 IP 地址,如果不为空,那么需要去回收,通过这样去避免错误的回收 IP 地址。

  2. 当有状态应用快速经历缩容又扩容,此时 tracePodWorker 检查到 Pod 的 Name 与 SpiderEndpoint 的 Name 一致,但 UID 却不同,这样的 SpiderEndpoint 应该被回收,但是在 tracePodWorker 无法追踪 SpiderEndpoint 所使用的 IPPool 的 Name 和 IP 信息,因此在 releaseIPPoolIPExecutor 方法中判断 endpoint.Status.Current.UID != podCache.UID,如果是,后续通过 gcAll 来处理。(Pod 的 k8s.v1.cni.cncf.io/networks-status 中没有记录 IPPool Name 信息,无法获取到历史 IP 记录,因此通过 GC all 处理。)

现象 2

当 GC All 处理残留的 spiderEndpoint 和 IPPool IP 时,由于是 statefulset 类型的应用,有固定 IP 地址的需求,我们应该考虑如下的场景,避免 IP 被 GC all 错误回收,导致 IP 固定异常。

  1. 如果是 statefulSet Pod 刚刚重启,正处于重启中的状态,此时其 Status.PodIPs 可能为空,gc all 无法确定该场景下的是否可以安全的去回收 IP 地址和 spiderEndpoint,出于安全考虑,新代码中对其跳过。

  2. 如果 statefulSet 没有使用 Spiderpool IPAM 了,此时检查到 Pod 与 ippool中记录的 Pod uid 不一致,并且判断 Status.PodIPs 中不为空,则证明 Pod 已经完成了重启,并且已经执行完成了 cmdAdd 操作配置好了 IP 地址,此时再去检查 SpiderEndpoint 的 UID 、ippool UID 与 Pod UID 是否一致,如果不一致,则证明 statefulSet Pod 没有使用 spiderpool 来分配 IP 地址,此时 GC all 应该回收 IPPool 中历史 IP 地址 和 spiderEndpoint,如果一致,则只需要回收 IPPool 中的历史 IP 地址

@ty-dc ty-dc requested a review from weizhoublue as a code owner July 26, 2024 07:43
Copy link

codecov bot commented Jul 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.28%. Comparing base (a3c940d) to head (f53668e).
Report is 30 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3778      +/-   ##
==========================================
+ Coverage   81.16%   81.28%   +0.11%     
==========================================
  Files          50       50              
  Lines        4391     4408      +17     
==========================================
+ Hits         3564     3583      +19     
+ Misses        670      669       -1     
+ Partials      157      156       -1     
Flag Coverage Δ
unittests 81.28% <100.00%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...orkloadendpointmanager/workloadendpoint_manager.go 100.00% <100.00%> (ø)

... and 1 file with indirect coverage changes

@ty-dc ty-dc changed the title fix: Spiderpool GC incorrect IP address during sts Pod scale up/down,… fix: Spiderpool GC incorrect IP address during statefulset Pod scale up/down, causing IP conflict Jul 26, 2024
@ty-dc ty-dc force-pushed the fix/error-gc branch 2 times, most recently from d636008 to 2aae3f1 Compare July 26, 2024 10:04
@ty-dc ty-dc added release/bug pr/not-ready not ready for merging labels Jul 26, 2024
@ty-dc ty-dc force-pushed the fix/error-gc branch 9 times, most recently from 4d58e97 to 64d00e2 Compare July 31, 2024 14:01
@ty-dc ty-dc requested a review from bzsuni as a code owner July 31, 2024 14:01
@ty-dc ty-dc added cherrypick-release-v0.8 Cherry-pick the PR to branch release-v0.8. cherrypick-release-v0.9 labels Aug 1, 2024
@ty-dc ty-dc force-pushed the fix/error-gc branch 5 times, most recently from 810b1ca to a5faf42 Compare August 16, 2024 10:35
wep, err := GetWorkloadByName(f, podYaml.Namespace, podYaml.Name)
if err != nil {
if api_errors.IsNotFound(err) {
return fmt.Errorf("endpoint %s/%s dose not exist", podYaml.Namespace, podYaml.Name)
Copy link
Collaborator

@weizhoublue weizhoublue Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

前面 comment 提到:这个函数在 for 遍历 ippool 过程中,不要 中途 return , 把所有的 正常的错误 都能够 printf 出来, 函数最后 退出时,再 报错,方便找出所有问题

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

podYaml, err := f.GetPod(podName, podNS)
if err != nil {
if api_errors.IsNotFound(err) {
return fmt.Errorf("pod %s/%s does not exist", podNS, podName)
Copy link
Collaborator

@weizhoublue weizhoublue Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

前面 comment 提到:这个函数在 for 遍历 ippool 过程中,不要 中途 return , 把所有的 IP 错误 都能够 printf 出来, 函数最后 退出时,再 报错,方便找出所有问题
显然这里找不到 ep 是个正常的错误,它不是 api 通信异常

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


podNetworkIPs, err := ParsePodNetworkAnnotation(f, podYaml)
if nil != err {
return fmt.Errorf("failed to parse pod network annotation: %v", err)
Copy link
Collaborator

@weizhoublue weizhoublue Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里不把 yaml 和 pod name 等 打印出来,错误了都不知道如何排障, 不知道 yaml 为什么 不能解析

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

f.Log("Failed: The UID %s recorded in IPPool %s for the Pod does not match the UID %s in the endpoint %s/%s.", poolIPAllocation.PodUID, poolName, tmpIPs[idx].UID, podNS, podName)
} else if tmpIPs[idx].IP != poolIP {
isSanity = false
f.Log("Failed: The IP %s recorded in IPPool %s for the Pod %s/%s does not match the IP %s in the endpoint %s/%s.", poolIP, poolName, podNS, podName, tmpIPs[idx].IP, podNS, podName)
Copy link
Collaborator

@weizhoublue weizhoublue Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里 各种 f.Log 不是 以 错误级别 来打印日志把 ?
这里 成功 和 失败 日志都走 f.Log

Copy link
Collaborator Author

@ty-dc ty-dc Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

  "msg"="Failed" "error"="the IP 192.158.26.2 recorded in IPPool v4pool-b7b3e87c9dcc33e for the Pod gc-chaos-ns-5056-466058729/gc-chaos-kruise-sts-1-5143-327365450-0 does not match the IP 192.158.26.2 in the endpoint gc-chaos-ns-5056-466058729/gc-chaos-kruise-sts-1-5143-327365450-0"

@ty-dc ty-dc force-pushed the fix/error-gc branch 2 times, most recently from 54a6fab to bb639f5 Compare August 17, 2024 16:01
@weizhoublue
Copy link
Collaborator

S [SKIPPED] [0.001 seconds]
Chaos Testing of GC [reclaim]
/home/runner/work/spiderpool/spiderpool/test/e2e/reclaim/chaos_test.go:27
  Chaos Testing of GC [BeforeEach]
  /home/runner/work/spiderpool/spiderpool/test/e2e/reclaim/chaos_test.go:212
    Multiple resource types compete for a single IPPool. In scenarios of creation, scaling up/down, and deletion, GC all can correctly handle IP addresses. [G00012, overlay]
    /home/runner/work/spiderpool/spiderpool/test/e2e/reclaim/chaos_test.go:275

  Timeline >>
  > Enter [BeforeEach] Chaos Testing of GC - /home/runner/work/spiderpool/spiderpool/test/e2e/reclaim/chaos_test.go:212 @ 08/17/24 16:23:58.476
  [SKIPPED] overlay CNI is not installed , ignore this case
  In [BeforeEach] at: /home/runner/work/spiderpool/spiderpool/test/e2e/reclaim/chaos_test.go:214 @ 08/17/24 16:23:58.477
  < Exit [BeforeEach] Chaos Testing of GC - /home/runner/work/spiderpool/spiderpool/test/e2e/reclaim/chaos_test.go:212 @ 08/17/24 16:23:58.477 (1ms)
  << Timeline

@weizhoublue
Copy link
Collaborator

SUCCESS! - Suite skipped in BeforeSuite -- 0 Passed | 0 Failed | 0 Pending | 12 Skipped

@weizhoublue
Copy link
Collaborator

SUCCESS! - Suite skipped in BeforeSuite -- 0 Passed | 0 Failed | 0 Pending | 3 Skipped

@weizhoublue
Copy link
Collaborator

SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 7 Skipped

@weizhoublue
Copy link
Collaborator

SUCCESS! - Suite skipped in BeforeSuite -- 0 Passed | 0 Failed | 1 Pending | 1 Skipped

@ty-dc
Copy link
Collaborator Author

ty-dc commented Aug 19, 2024

这个 case 耗时长,underlay 模式case 多,跑的时间整体比较长了,这个 case 放到 overlay 里面跑的。能缩小整体 CI 的耗时

• [290.750 seconds]
Chaos Testing of GC [reclaim]
/home/runner/work/spiderpool/spiderpool/test/e2e/reclaim/chaos_test.go:27
  Chaos Testing of GC
  /home/runner/work/spiderpool/spiderpool/test/e2e/reclaim/chaos_test.go:198
    Multiple resource types compete for a single IPPool. In scenarios of creation, scaling up/down, and deletion, GC all can correctly handle IP addresses. [G00012, overlay]
    /home/runner/work/spiderpool/spiderpool/test/e2e/reclaim/chaos_test.go:275

  Timeline >>
  > Enter [BeforeEach] Chaos Testing of GC - /home/runner/work/spiderpool/spiderpool/test/e2e/reclaim/chaos_test.go:212 @ 08/17/24 16:20:53.909
  SPIDERPOOL_GC_DEFAULT_INTERVAL_DURATION: 30 
  succeed to create namespace gc-chaos-ns-1740-6212094 
  succeed to create v4 IPPool v4pool-a15d24dbf181e42 
  succeed to create v6 IPPool v6pool-9e38e34988d1589 
  < Exit [BeforeEach] Chaos Testing of GC - /home/runner/work/spiderpool/spiderpool/test/e2e/reclaim/chaos_test.go:212 @ 08/17/24 16:20:54.994 (1.086s)
  > Enter [It] Multiple resource types compete for a single IPPool. In scenarios of creation, scaling up/down, and deletion, GC all can correctly handle IP addresses. - /home/runner/work/spiderpool/spiderpool/test/e2e/reclaim/chaos_test.go:275 @ 08/17/24 16:20:54.995
  succeed to generate annotations map[ipam.spidernet.io/ippool:{"ipv4":["v4pool-a15d24dbf181e42"],"ipv6":["v6pool-9e38e34988d1589"]} v1.multus-cni.io/default-network:kube-system/macvlan-vlan0] 
  Succeeded in creating gc-chaos-ns-1740-6212094/gc-chaos-deploy-1-2054-995260084 at time 2024-08-17 16:20:59.995798619 +0000 UTC m=+200.277877970 
  Succeeded in creating gc-chaos-ns-1740-6212094/gc-chaos-kruise-sts-1-2054-995274632 at time 2024-08-17 16:21:07.996414739 +0000 UTC m=+208.278494080 

@weizhoublue
Copy link
Collaborator

回复我 上周五 的 comment

… causing IP conflict

Signed-off-by: ty-dc <tao.yang@daocloud.io>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherrypick-release-v0.8 Cherry-pick the PR to branch release-v0.8. cherrypick-release-v0.9 cherrypick-release-v1.0 Cherry-pick the PR to branch release-v1.0. release/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

After the statefulset quickly experiences scale up/down, an IP conflict occurs in the statefulset Pod.
2 participants