Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RayCluster] Make headpod name deterministic #3028

Merged

Conversation

owenowenisme
Copy link
Contributor

@owenowenisme owenowenisme commented Feb 12, 2025

Why are these changes needed?

Related issue number

Closes #3013

Checks

( raycluster-autoscaler is the rayCluster name from a random yaml I used for testing, which is not related to the issue.)

❯ kubectl get pods -l=ray.io/is-ray-node=yes
NAME                         READY   STATUS    RESTARTS   AGE
raycluster-autoscaler-head   2/2     Running   0          9m34s
  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Signed-off-by: owenowenisme <mses010108@gmail.com>
Signed-off-by: owenowenisme <mses010108@gmail.com>
@owenowenisme owenowenisme marked this pull request as ready for review February 12, 2025 10:55
Signed-off-by: owenowenisme <mses010108@gmail.com>
…adpod name is fixed now

Signed-off-by: owenowenisme <mses010108@gmail.com>
@owenowenisme
Copy link
Contributor Author

Tests that are failed by this changed will also be fixed in this PR.

Signed-off-by: owenowenisme <mses010108@gmail.com>
…add error log when multiple headpod is found

Signed-off-by: owenowenisme <mses010108@gmail.com>
Signed-off-by: owenowenisme <mses010108@gmail.com>
@kevin85421 kevin85421 self-assigned this Feb 12, 2025
Signed-off-by: owenowenisme <mses010108@gmail.com>
@owenowenisme owenowenisme force-pushed the make-headpod-name-deterministic branch from 1b991ed to 713ff51 Compare February 12, 2025 19:13
}
r.rayClusterScaleExpectation.ExpectScalePod(extraHeadPodToDelete.Namespace, instance.Name, expectations.HeadGroup, extraHeadPodToDelete.Name, expectations.Delete)
}
logger.Info("reconcilePods", fmt.Sprintf("Multiple head pods found, it should only exist one head pod. Please delete extra head pods and leave only the head pod with name `%s-head`.", instance.Name), headPods.Items)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. please don't use fmt.Sprintf inside log function. See https://github.com/nginx/nginx-gateway-fabric/blob/main/docs/developer/logging-guidelines.md#message-guidelines for more details.

  2. please don't print headPods.Items. Instead, print the name of head Pods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if len(services.Items) > 1 {
logger.Info("reconcileHeadService", "Duplicate head service found", services.Items)
return fmt.Errorf("%d head service found %v", len(services.Items), services.Items)
}

I'm actually following the HeadService which also print the Items, should I change it as well?

Signed-off-by: owenowenisme <mses010108@gmail.com>
@owenowenisme
Copy link
Contributor Author

owenowenisme commented Feb 13, 2025

current err msg look like this

{"level":"info","ts":"2025-02-13T04:16:20.592Z","logger":"controllers.RayCluster",
"msg":"Multiple head pods found, it should only exist one head pod. Please delete extra head pods.",
"RayCluster":{"name":"rayservice-sample-raycluster-9wl9m","namespace":"test-ns-6zktp"},
"reconcileID":"6d93964f-00a4-4d78-8db6-0d40c53804e2",
"found pods":["mytest-head","rayservice-sample-raycluster-9wl9m-head"],
"should only leave":"rayservice-sample-raycluster-9wl9m-head"}
{"level":"info","ts":"2025-02-13T04:16:20.592Z",
"logger":"controllers.RayCluster",
"msg":"Reconciliation error",
"RayCluster":{"name":"rayservice-sample-raycluster-9wl9m","namespace":"test-ns-6zktp"},
"reconcileID":"6d93964f-00a4-4d78-8db6-0d40c53804e2",
"error":"2 head pods found [mytest-head rayservice-sample-raycluster-9wl9m-head]. Please delete extra head pods and leave only the head pod with name rayservice-sample-raycluster-9wl9m-head"}

Signed-off-by: owenowenisme <mses010108@gmail.com>
@owenowenisme
Copy link
Contributor Author

owenowenisme commented Feb 13, 2025

// Reconcile worker pods now
for _, worker := range instance.Spec.WorkerGroupSpecs {
if !r.rayClusterScaleExpectation.IsSatisfied(ctx, instance.Namespace, instance.Name, worker.GroupName) {
logger.Info("reconcilePods", "worker group", worker.GroupName, "Expectation", "NotSatisfiedGroupExpectations, reconcile the group later")
continue
}
// workerReplicas will store the target number of pods for this worker group.
var workerReplicas int32 = utils.GetWorkerGroupDesiredReplicas(ctx, worker)
logger.Info("reconcilePods", "desired workerReplicas (always adhering to minReplicas/maxReplica)", workerReplicas, "worker group", worker.GroupName, "maxReplicas", worker.MaxReplicas, "minReplicas", worker.MinReplicas, "replicas", worker.Replicas)
workerPods := corev1.PodList{}
if err := r.List(ctx, &workerPods, common.RayClusterGroupPodsAssociationOptions(instance, worker.GroupName).ToListOptions()...); err != nil {
return err
}

@kevin85421
I'm thinking that maybe we should change the var name from worker to something like workerGroup?
Which I think is more accurate.

BTW, I think the reconcile Worker for-loop in reconcilePod is pretty hard to read and realize for it is 150 lines and have to trace many function all over files, maybe we should refactor this part?

@owenowenisme owenowenisme changed the title Make headpod name deterministic [RayCluster] Make headpod name deterministic Feb 13, 2025
@kevin85421 kevin85421 merged commit b9d8b1a into ray-project:master Feb 14, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Refactor] Make head Pod name deterministic
2 participants