You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had searched in the issues and found no similar feature requirement.
Description
The RayCluster controller handles edge cases where multiple head Pods are created. This is possible in some extreme cases, even though we have already implemented expectations.
Having multiple head Pods is a fatal error for a Ray cluster. If there is more than one Pod behind the head service, worker Pods may connect to different head Pods if the underlying connection uses the service name instead of the virtual IP. If a worker Pod connects to different head Pod, it may be killed by the head Pod.
Currently, the name of the head Pod follows the format raycluster-kuberay-head-xxxxx (raycluster-kuberay is the name of the RayCluster CR). The Pod name is undeterministic.
Two action items:
Make the name of the head Pod deterministic (e.g., raycluster-kuberay-head-xxxxx → raycluster-kuberay-head) so that the K8s API server rejects the creation request if extreme cases occur.
Search before asking
Description
The RayCluster controller handles edge cases where multiple head Pods are created. This is possible in some extreme cases, even though we have already implemented
expectations
.kuberay/ray-operator/controllers/ray/raycluster_controller.go
Line 736 in 0d848f9
Having multiple head Pods is a fatal error for a Ray cluster. If there is more than one Pod behind the head service, worker Pods may connect to different head Pods if the underlying connection uses the service name instead of the virtual IP. If a worker Pod connects to different head Pod, it may be killed by the head Pod.
Currently, the name of the head Pod follows the format
raycluster-kuberay-head-xxxxx
(raycluster-kuberay
is the name of the RayCluster CR). The Pod name is undeterministic.Two action items:
Make the name of the head Pod deterministic (e.g.,
raycluster-kuberay-head-xxxxx
→raycluster-kuberay-head
) so that the K8s API server rejects the creation request if extreme cases occur.Remove
kuberay/ray-operator/controllers/ray/raycluster_controller.go
Lines 736 to 754 in 0d848f9
Use case
No response
Related issues
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: