Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor][RayCluster] Unify status update to single place #2249

Conversation

MortalHappiness
Copy link
Contributor

@MortalHappiness MortalHappiness commented Jul 16, 2024

Why are these changes needed?

See the description in the corresponding issue for details.

Implementation details of this PR:

  • updateClusterState and updateClusterReason functions are integrated into a single function updateRayClusterStatus.
  • The inconsistentRayClusterStatus checking is moved into updateRayClusterStatus, so that the status will only be updated if it is inconsistent.
  • calculateStatus function takes one more argument called reconcileErr. If it is a non-nil value, then the Status.State will be set to Failed and Status.Reason will be set to the string representation of the error.
  • Reconcile functions for each resource:
    • Before: Calling each reconcile function and if there is an error, update the status and then return immediately.
    • Now: Stored reconcile functions into a slice and loop over it. If there is an error, then save it to reconcileErr variable for processing later.

Related issue number

Resolves: #2235

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@MortalHappiness MortalHappiness marked this pull request as draft July 16, 2024 12:09
@MortalHappiness MortalHappiness force-pushed the feature/#2235-refacter-raycluster-status branch from 157ea19 to fad4424 Compare July 16, 2024 13:26
@MortalHappiness MortalHappiness marked this pull request as ready for review July 16, 2024 14:00
@kevin85421 kevin85421 self-requested a review July 17, 2024 00:14
@kevin85421 kevin85421 self-assigned this Jul 17, 2024
Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Would you mind adding a test in raycluster_controller_test.go to check the following:

  1. RayCluster becomes ready.
  2. RayCluster becomes failed.
  3. RayCluster returns to ready again.

// reconcileServeService
func(ctx context.Context, instance *rayv1.RayCluster) error {
// Only reconcile the K8s service for Ray Serve when the "ray.io/enable-serve-service" annotation is set to true.
if enableServeServiceValue, exist := instance.Annotations[utils.EnableServeServiceKey]; exist && enableServeServiceValue == utils.EnableServeServiceTrue {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about checking this inside reconcileServeService?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
return nil
},
// reconcilePods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using r.reconcilePods directly? We can remove the event for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


for _, fn := range reconcileFuncs {
if reconcileErr = fn(ctx, instance); reconcileErr != nil {
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a log for reconcileErr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Resolves: ray-project#2235
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Resolves: ray-project#2235
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Resolves: ray-project#2235
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
@MortalHappiness MortalHappiness force-pushed the feature/#2235-refacter-raycluster-status branch from fad4424 to 0719b3a Compare July 17, 2024 09:24
@MortalHappiness
Copy link
Contributor Author

MortalHappiness commented Jul 17, 2024

Looks good! Would you mind adding a test in raycluster_controller_test.go to check the following:

  1. RayCluster becomes ready.
  2. RayCluster becomes failed.
  3. RayCluster returns to ready again.

Should these tests be added in this PR? This PR is for refactoring purpose. It might be better to add these tests in a separate PR.

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kevin85421 kevin85421 merged commit 0c99b42 into ray-project:master Jul 17, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Refactor] REP 54: Refactor usages of the inconsistentRayClusterStatus function
2 participants