Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e fix: spiderpoolcontroller Pod termination failed #3688

Merged
merged 1 commit into from
Jul 21, 2024

Conversation

ty-dc
Copy link
Collaborator

@ty-dc ty-dc commented Jul 2, 2024

Thanks for contributing!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #3651

Special notes for your reviewer:

@ty-dc ty-dc added the release/none no release note label Jul 2, 2024
Copy link

codecov bot commented Jul 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.16%. Comparing base (8839e6c) to head (ccebaf5).
Report is 14 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3688      +/-   ##
==========================================
- Coverage   81.21%   81.16%   -0.05%     
==========================================
  Files          50       50              
  Lines        4391     4391              
==========================================
- Hits         3566     3564       -2     
- Misses        669      670       +1     
- Partials      156      157       +1     
Flag Coverage Δ
unittests 81.16% <ø> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 1 file with indirect coverage changes

@@ -697,6 +697,18 @@ var _ = Describe("test ip with reclaim ip case", Label("reclaim"), func() {
commandStr := "systemctl stop kubelet"
output, err := frame.DockerExecCommand(ctx, workerNodeName, commandStr)
Expect(err).NotTo(HaveOccurred(), "Failed exec '%s' in docker container '%s', error is: %v,log: %v.", commandStr, workerNodeName, err, string(output))
// Prevent spiderpoolcontroller Pod termination failure
podList, err := frame.GetPodListByLabel(map[string]string{"app.kubernetes.io/component": constant.SpiderpoolController})
Copy link
Collaborator

@weizhoublue weizhoublue Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeferCleanup 中的本意 是 “ 如果 用例成功了,那么 希望 所有的 pod 都恢复 running, 且 尤其是 spiderpool controller 恢复 running,且 其 webhook 正常 ” , 以不影响下一个case

所以,新加代码 和 DeferCleanup 中的原本代码,并不直接满足 本意,而是一些 头痛医头 、脚痛医脚的代码。 应该按照 本意 去在 DeferCleanup 中 完毕 所有 恢复工作的闭环

Copy link
Collaborator Author

@ty-dc ty-dc Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

赞同 "DeferCleanup 中的本意",但是这个 case 是构造了冲突 IP 条件,再 stop 了 worker 节点 kubelet。然后重启 kubelet 这个阶段中去等待由于冲突 IP 地址的节点异常。

此处的 DeferCleanup 和新加代码是属于本次 case 中的行为,需要在这个 case 中去闭环。

Copy link
Collaborator

@weizhoublue weizhoublue Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1)那 恢复环境的 动作代码(以避免下一个case) , 在哪?
(2)整个用例是模拟事故,事故中不可能会主动在 “stop kubelet” 后 保障 “SpiderpoolController” 可用,这不是真实事故的“模拟”。 用例应该是模拟事故的真实发生 来测出所有的边际问题

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any update ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 这个 case 标记了串行执行的,ginkgo 默认当一个 case 运行失败,就会终止测试运行。而本次故障模拟,将影响这个 case 本身的通过性,如果故障后,环境却没有恢复,那么这个 case 本身也不会通过。(如果 case 没有通过,那么已经报错退出了,不会再去运行其他 case),所以额外的恢复环境代码,可以不添加。
  2. 该case想要构造节点处于 not-ready 的状态,然后恢复后,spiderpool-controller Pod 可以正常功能,能够释放冲突的 IP 地址。为什么不是 reboot node 模拟故障,由于在 kind 集群模式下,重启 node 总导致一些额外的问题,重启后,环境极难恢复。故选择 “stop kubelet” 模拟故障。

Signed-off-by: ty-dc <tao.yang@daocloud.io>
@weizhoublue weizhoublue merged commit e8945d2 into spidernet-io:main Jul 21, 2024
48 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release/none no release note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

After restarting the Pod, it prompts spiderpool-controller is still not ready with webhook
2 participants