Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The CustomResourceDefinition "clusterpolicies.nvidia.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes #194

Closed
d-m opened this issue May 25, 2021 · 2 comments
Assignees

Comments

@d-m
Copy link

d-m commented May 25, 2021

I get the following error when applying the custom resource definition for clusterpolicy objects:

$ kubectl apply -f https://raw.githubusercontent.com/NVIDIA/gpu-operator/master/deployments/gpu-operator/crds/nvidia.com_clusterpolicies_crd.yaml
The CustomResourceDefinition "clusterpolicies.nvidia.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes                                                                                                                                    

After researching some, it seems like this is due to the annotations placed on the CRD resource when using kubectl apply (see kubernetes-sigs/kubebuilder#1140 (comment)). I verified that using kubectl create works, however I'm concerned that updating the CRD with kubectl replace going forward may cause issues with deployed cluster policy objects by deleting and recreating the CRD.

@shivamerla
Copy link
Contributor

@d-m Since the ClusterPolicy spec handles creation of eight Daemonsets, size of the CR has become huge. Yes, the limitation with last-applied-configuration annotation will break upgrades. Since all Daemonset values needs to be configurable, not sure we can overcome this with single CRD we have. eg driver spec is below:

driver:
  enabled: true
  repository: nvcr.io/nvidia
  image: driver
  version: "460.73.01"
  imagePullPolicy: IfNotPresent
  imagePullSecrets: []
  env: []
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
    effect: NoSchedule
  nodeSelector:
    nvidia.com/gpu.deploy.driver: "true"
  affinity: {}
  podSecurityContext: {}
  securityContext:
    privileged: true
    seLinuxOptions:
      level: "s0"
  resources: {}
  # private mirror repository configuration
  repoConfig:
    configMapName: ""
    destinationDir: ""
  # vGPU licensing configuration
  licensingConfig:
    configMapName: ""
  priorityClassName: system-node-critical

With the limitation on max size of CR(which will end up in last-applied-configuration annotation), it would make sense to split each Daemonset config into a separate CRD (i.e NvidiaDriver, NvidiaDevicePlugin, NvidiaDCGMExporter, NvidiaGPUFeatureDiscovery, NvidiaMIGManager, NvidiaContainerToolkit, NvidiaValidator) and individual CR's control configuration for each Daemonset we deploy.

Currently we don't support upgrade of ClusterPolicy types, so un-install and install is always recommended. We are looking to support upgrades in future releases, so this will be a design discussion we will have.

@shivamerla shivamerla self-assigned this May 25, 2021
@shivamerla
Copy link
Contributor

fixed with: f839a70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants