Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Issues with RayCluster CRD and kubectl apply #271

Closed
2 tasks done
DmitriGekhtman opened this issue May 20, 2022 · 13 comments
Closed
2 tasks done

[Bug] Issues with RayCluster CRD and kubectl apply #271

DmitriGekhtman opened this issue May 20, 2022 · 13 comments
Labels

Comments

@DmitriGekhtman
Copy link
Collaborator

DmitriGekhtman commented May 20, 2022

Search before asking

  • I searched the issues and found no similar issues.

KubeRay Component

Others

What happened + What you expected to happen

kubectl apply -k manifests/cluster-scope-resources yields the error
The CustomResourceDefinition "rayclusters.ray.io" is invalid: metadata.annotations: Too long: must have at most 262144 bytes.

Reason:
After re-generating the KubeRay CRD in #268, some pod template fields from recent versions of K8s were generated. Now the CRD is too big to fit in the metadata.lastAppliedConfiguration field used by kubectl apply.

The solution I'd propose is to move the CRD out of the kustomization file and advise users to kubectl create the CRD before installing the rest of the cluster-scoped resources.

Reproduction script

See above.

Anything else

After running kubectl apply -k, I tried to kubectl delete -k so that I could subsequently kubectl create -k.
Unfortunately, my ray-system namespace is hanging in a terminating state!
edit: My ray-system namespace is hanging simply because cluster is 100% borked.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@DmitriGekhtman DmitriGekhtman added the bug Something isn't working label May 20, 2022
@askainet
Copy link

In case this helps other people using ArgoCD to deploy KubeRay, we solved this issue using a Kustomization and patching the RayCluster CRD with the annotation argocd.argoproj.io/sync-options: Replace=true to make ArgoCD use kubectl replace instead of kubectl apply when syncing this particular resource:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - https://github.com/ray-project/kuberay/manifests/cluster-scope-resources/?ref=master
  - https://github.com/ray-project/kuberay/manifests/base/?ref=master

patchesStrategicMerge:
  # CRD rayclusters.ray.io manifest is too big to fit in the
  # annotation `kubectl.kubernetes.io/last-applied-configuration`
  # added by `kubectl apply` used by ArgoCD, and so it fails
  # https://github.com/ray-project/kuberay/issues/271
  # Annotate this CRD to make ArgoCD use `kubectl replace` and avoid the error when syncing it
  - |-
    apiVersion: apiextensions.k8s.io/v1
    kind: CustomResourceDefinition
    metadata:
      name: rayclusters.ray.io
      annotations:
        argocd.argoproj.io/sync-options: Replace=true

@23ewrdtf
Copy link

I have the same issue.

@DmitriGekhtman
Copy link
Collaborator Author

We'll start by replacing "apply" in the docs with "create". Then we'll look into shrinking the CRD.
It seems this bug comes up from time to time in various K8s projects...

@goswamig
Copy link

goswamig commented Jun 6, 2022

Also to extend this, we should have status and restarts about running clusters ?

$ kubectl get rayclusters 
NAME                  AGE
raycluster-complete   7m48s

it used to be

$ kubectl -n ray get rayclusters
NAME              STATUS    RESTARTS   AGE
example-cluster   Running   0          53s

@DmitriGekhtman
Copy link
Collaborator Author

DmitriGekhtman commented Jun 6, 2022

Status could make sense -- it would simply indicate the status of the head pod.
Restarts are a bit flimsier as a notion because we don't quite have a coherent notion of what constitutes a restart -- I guess that would mean the number of head container restarts + the number of head pod replacements.

We could potentially take a look at what the K8s deployment controller does.

DmitriGekhtman added a commit that referenced this issue Jun 11, 2022
…#302)

This PR adds a warning about a known issue (#271) to the KubeRay docs.
@Jeffwan Jeffwan added this to the v0.4.0 release milestone Jul 27, 2022
@Dhouti
Copy link

Dhouti commented Aug 18, 2022

Try using kubectl apply --server-side

@haoxins
Copy link
Contributor

haoxins commented Sep 6, 2022

For the Argo CD users, maybe we can add some instructions into the document?
Just like I did for the Flink operator project
https://github.com/apache/flink-kubernetes-operator/blob/main/docs/content/docs/operations/helm.md#working-with-argo-cd

@DmitriGekhtman
Copy link
Collaborator Author

For the Argo CD users, maybe we can add some instructions into the document? Just like I did for the Flink operator project https://github.com/apache/flink-kubernetes-operator/blob/main/docs/content/docs/operations/helm.md#working-with-argo-cd

@haoxins
That sounds good.
If you have a working set-up with Argo CD / Helm / KubeRay, feel free to open a PR adding the relevant info to the README!
https://github.com/ray-project/kuberay/blob/master/helm-chart/kuberay-operator/README.md

@haoxins
Copy link
Contributor

haoxins commented Sep 6, 2022

For the Argo CD users, maybe we can add some instructions into the document? Just like I did for the Flink operator project https://github.com/apache/flink-kubernetes-operator/blob/main/docs/content/docs/operations/helm.md#working-with-argo-cd

@haoxins That sounds good. If you have a working set-up with Argo CD / Helm / KubeRay, feel free to open a PR adding the relevant info to the README! https://github.com/ray-project/kuberay/blob/master/helm-chart/kuberay-operator/README.md

#535

@DmitriGekhtman
Copy link
Collaborator Author

We could update the docs to mention that kubectl apply --server-side works.

@DmitriGekhtman
Copy link
Collaborator Author

DmitriGekhtman commented Nov 4, 2022

I think for the moment, the only actionable item is the documentation item described in the last comment.
Going to remove the 0.4.0 milestone label from this issue because docs are not currently versioned.

@DmitriGekhtman DmitriGekhtman removed this from the v0.4.0 release milestone Nov 4, 2022
tekumara added a commit to tekumara/ray-demo that referenced this issue Dec 3, 2022
@kevin85421
Copy link
Member

I thought this had already been documented.

lowang-bh pushed a commit to lowang-bh/kuberay that referenced this issue Sep 24, 2023
@gushob21
Copy link

gushob21 commented Dec 20, 2023

Can we not attempt to install both CRDs and Kuberay operator using Kustomize at the same time. When I try to do it, it throws the following error:

admin@instance-1:~$ cat kustomization.yaml 
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
metadata:
  name: "ray"
resources:
 - "https://github.com/ray-project/kuberay/manifests/cluster-scope-resources?ref=v1.0.0&timeout=90s"
 - "https://github.com/ray-project/kuberay/manifests/base?ref=v1.0.0&timeout=90s"

admin@instance-1:~$ kustomize build .

Error: accumulating resources: accumulation err='accumulating resources from 'https://github.com/ray-project/kuberay/manifests/base?ref=v1.0.0&timeout=90s': URL is a git repository': recursed merging from path '/tmp/kustomize-3664874623/manifests/base': may not add resource with an already registered id: Namespace.v1.[noGrp]/ray-system.[noNs]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants