Skip to content
This repository has been archived by the owner on May 27, 2024. It is now read-only.

fail to create nodefeature #47

Closed
ejlee125 opened this issue Jun 29, 2023 · 1 comment
Closed

fail to create nodefeature #47

ejlee125 opened this issue Jun 29, 2023 · 1 comment

Comments

@ejlee125
Copy link

ejlee125 commented Jun 29, 2023

Hello,
I tried to build kubernetes on MIG gpus with nvidia-device-plugin and gpu-feature-discovery.
I installed two repo wih helm3 and "kubectl describe node" shows "nvidia-com:mig-~~" on Capcaity and Allocatable section. And "feature.node.kubernetes.io/cpu-" items are listed in label section also. But I can not see the label start with "nvidia.com"

And gpu-node-feature pod shows errors;

E0629 02:05:03.783259       1 main.go:95] failed to create NodeFeature object "nvidia-features-for-": NodeFeature.nfd.k8s-sigs.io "nvidia-features-for-" is invalid: metadata.name: Invalid value: "nvidia-features-for-": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

gpu-feature-discovery : 0.8.0
nvidia-device-plugin : 0.12.0

How can I fix this?

@ejlee125
Copy link
Author

ejlee125 commented Jul 4, 2023

I found that cause of failure was clusterrole issue in gpu-feature-discovery.
After adding "create" verb in nodefeatures resources in gpu-feature-discovery clusterrole, labels for "feature.node" on node were successfully listed.

@ejlee125 ejlee125 closed this as completed Jul 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant