Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webhook: automatically insert the rdma resource to pod by DRA claim #3670

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cyclinder
Copy link
Collaborator

Thanks for contributing!

What type of PR is this?

  • release/feature

What this PR does / why we need it:

webhook: auto inject the rdma resource to pod.

pod.resourceClaims -> spiderclaimparameters.staticNics -> spidermultusconfig -> resourceName -> pod.resource.requests.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

@cyclinder cyclinder added the release/feature-new release note for new feature label Jun 26, 2024
Copy link

codecov bot commented Jun 26, 2024

Codecov Report

Attention: Patch coverage is 0% with 103 lines in your changes missing coverage. Please review.

Project coverage is 79.30%. Comparing base (5ff6c82) to head (7663fa2).

Current head 7663fa2 differs from pull request most recent head 7f26f0b

Please upload reports for the commit 7f26f0b to get more accurate results.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3670      +/-   ##
==========================================
- Coverage   81.68%   79.30%   -2.39%     
==========================================
  Files          50       51       +1     
  Lines        4391     4494     +103     
==========================================
- Hits         3587     3564      -23     
- Misses        643      773     +130     
+ Partials      161      157       -4     
Flag Coverage Δ
unittests 79.30% <0.00%> (-2.39%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
pkg/podmanager/pod_webhook.go 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

ClientSet *kubernetes.Clientset
}

func New(enableDra bool, clientSet *kubernetes.Clientset) *PodWebhook {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 pkg 下 有 NewPodManager , xxxx
因此, 这个 New 要 调整下,随意了点
NewPodWebHook ?

Copy link
Collaborator

@weizhoublue weizhoublue Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

然后 作为 一个 完整的 pkg, 强烈 建议 把 新加逻辑 合入 PodManager interface , 而不是 新来一个 PodWebhook struct, 这样后续 可以最大的实现 相互之间的 代码复用

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 podManager 以前是作为 get/list pod 的tools, 太多地方引用,比如agent。如果合入进来会对 agent 造成干扰

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

webhook 的逻辑应该只在 controller 里面

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这也是目前其他所有代码位置把manager 和 wehbook 分开的原因

var _ webhook.CustomValidator = &PodWebhook{}

// Default implements admission.CustomDefaulter.
func (pw *PodWebhook) Default(ctx context.Context, obj runtime.Object) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要单位测试

return ""
}

func (pw *PodWebhook) injectRdmaResourceToPod(resourceMap map[string]bool, pod *corev1.Pod) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数中的逻辑 是 你想的,还是 从其他地方借鉴的 有最佳实践的

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是我自己想的,没找到可借鉴的地方

for resource, found := range resourceMap {
if !found {
// inject the resource to the pod.containers[0].resources.requests
pod.Spec.Containers[0].Resources.Requests[corev1.ResourceName(resource)] = k8s_resource.MustParse("1")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里缺少 一行重要的 info 级别注入日志

Copy link
Collaborator Author

@cyclinder cyclinder Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在调用函数之前已经log过了,这里不想在函数内部再次 log,对单元测试来说比较麻烦,而且会在 调用函数之后 show pod 的 yaml

@weizhoublue
Copy link
Collaborator

没有 E2E ,至少 需要 把 E2E 设计 提到 doc

@cyclinder cyclinder force-pushed the dra/webhook_resourceName branch 2 times, most recently from 3c361de to aad9b28 Compare June 27, 2024 07:20
}

// try to find the resource in container resources.requests
if _, ok := c.Resources.Requests[corev1.ResourceName(resource)]; ok {
Copy link
Collaborator

@weizhoublue weizhoublue Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果 pod 之前有 resource ,但是 values 已经不是 最新的了(不正确),这里应该要纠正 ?如果不纠正,需要有个日志

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不应该纠正,无法定义 pod.resource 是否正确,可能是用户有意为之

Copy link
Collaborator

@weizhoublue weizhoublue Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果是 用户改了 parameter 这边的定义呢 ?然后重启pod

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有道理,但这里不太好区分 是用户自己改了 parameter 还是手动改了 pod

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里还要思考一下

Copy link
Collaborator

@weizhoublue weizhoublue Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用户是不会手动 改 pod,只会 改 deployment , 但是 判断 pod 和 deployment 的 resource 是否一致 , 又有绝大 代码成本,且控制器类型很多

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不管用户改 deploy 还是 pod,对于webhook 来说都是改 pod

@cyclinder cyclinder force-pushed the dra/webhook_resourceName branch 4 times, most recently from 397294b to 2df3d2a Compare June 27, 2024 10:42
@cyclinder cyclinder force-pushed the dra/webhook_resourceName branch 6 times, most recently from 553cb0b to 7f26f0b Compare June 28, 2024 01:59
return nil
}

func (pw *podManager) getStaticNicsFromSpiderClaimParameter(ctx context.Context, pod *corev1.Pod) ([]spiderpoolv2beta1.StaticNic, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个api 应该是在 spiderDra 之类的 目录下

return []spiderpoolv2beta1.StaticNic{}, nil
}

func (pw *podManager) getResourceMapFromStaticNics(ctx context.Context, staticNics []spiderpoolv2beta1.StaticNic) (map[string]bool, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个api 应该是在 spiderDra 之类的 目录下

}

// resourceName return the resourceName for given spiderMultusConfig
func (pw *podManager) resourceName(smc *spiderpoolv2beta1.SpiderMultusConfig) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个api 应该是在 spiderDra 之类的 目录下

return ""
}

func InjectRdmaResourceToPod(resourceMap map[string]bool, pod *corev1.Pod) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 放到 本目录的 utils.go 比较合适

@weizhoublue weizhoublue changed the title webhook: auto inject the rdma resource to pod webhook: automatically insert the rdma resource to pod by DRA claim Jun 28, 2024
@cyclinder
Copy link
Collaborator Author

仍然在排查 webhook 流程问题:

Events:
  Type     Reason        Age                 From                   Message
  ----     ------        ----                ----                   -------
  Warning  FailedCreate  10s (x13 over 31s)  replicaset-controller  Error creating: Internal error occurred: failed calling webhook "pods.kubernetes.io": failed to call webhook: the server could not find the requested resource
root@10-20-1-200:~# kubectl logs  -n kube-system kube-apiserver-spider-control-plane | grep pods.kubernetes.io
W0628 10:24:25.536243       1 dispatcher.go:225] Failed calling webhook, failing closed pods.kubernetes.io: failed calling webhook "pods.kubernetes.io": failed to call webhook: the server could not find the requested resource
	logging error output: "k8s\x00\n\f\n\x02v1\x12\x06Status\x12\xbe\x02\n\x06\n\x00\x12\x00\x1a\x00\x12\aFailure\x1a\x8e\x01Internal error occurred: failed calling webhook \"pods.kubernetes.io\": failed to call webhook: the server could not find the requested resource\"\rInternalError*\x87\x01\n\x00\x12\x00\x1a\x00\"{\n\x00\x12ufailed calling webhook \"pods.kubernetes.io\": failed to call webhook: the server could not find the requested resource\x1a\x00(\x002\x000\xf4\x03\x1a\x00\"\x00"
W0628 10:24:25.553750       1 dispatcher.go:225] Failed calling webhook, failing closed pods.kubernetes.io: failed calling webhook "pods.kubernetes.io": failed to call webhook: the server could not find the requested resource

@weizhoublue
Copy link
Collaborator

the server could not find the requested resource

the server could not find the requested resource ---- is spiderpool parameter CRD not registerred ?

@weizhoublue
Copy link
Collaborator

just go ahead next Monday

@cyclinder
Copy link
Collaborator Author

the server could not find the requested resource ---- is spiderpool parameter CRD not registered ?

No, all spiderpool CRD has been registered, this webhook is just for pods, the error is extraordinary

@@ -238,3 +271,161 @@ func (pm *podManager) GetPodTopController(ctx context.Context, pod *corev1.Pod)
UID: podOwner.UID,
}, nil
}

// Default implements admission.CustomDefaulter.
Copy link
Collaborator

@weizhoublue weizhoublue Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest to add some comments above the function to indicate what the webhook will handle , it will be clear as the feature grows

// whebhook is for:
// 1. when dra is enabled, it insert RDMA resource name to the pod
// ....

@cyclinder
Copy link
Collaborator Author

@weizhoublue 我建议这个webhook 只对带有某些 label 的 pod 生效(比如 resources.spidernet.io/webhook-resources: true),不然可能会影响其他 pod 甚至是 apiserver 的正常启动, 测试过程也发现了这个问题

@@ -144,6 +144,40 @@ webhooks:
- spidercoordinators
sideEffects: None
{{- end }}
{{- if .Values.dra.enabled }}
Copy link
Collaborator

@weizhoublue weizhoublue Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 不需要 用户判断,由 controller 代码动态 patch

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

什么情况下patch

@cyclinder cyclinder force-pushed the dra/webhook_resourceName branch 2 times, most recently from 5c4bcdc to cef2040 Compare July 17, 2024 13:15
@cyclinder cyclinder force-pushed the dra/webhook_resourceName branch 2 times, most recently from d972169 to 117ee64 Compare July 31, 2024 03:49
Signed-off-by: cyclinder <qifeng.guo@daocloud.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release/feature-new release note for new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants