Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip interpret health of resources without a hook #5530

Merged
merged 1 commit into from
Oct 8, 2024

Conversation

a7i
Copy link
Contributor

@a7i a7i commented Sep 11, 2024

What type of PR is this?
/kind feature

What this PR does / why we need it:
In one of our small staging clusters, we get a lot of InterpretHealthFailed events (over 100k per week). After investigations, it seems to be resources such as ClusterRole and ClusterRoleBinding that never have a status, so there is nothing to inspect.

Screenshot 2024-09-11 at 4 21 34 PM

This is also repeated over and over again in karmada-controller-manager logs:

"Interpret health of object(ClusterRole//oncall) failed, err: default InterpretHealth interpreter for \"rbac.authorization.k8s.io/v1, Kind=ClusterRole\" not found." logger="events" type="Warning" object={"kind":"Work","namespace":"karmada-es-REDACTED","name":"oncall-76bd899f7f","uid":"7a93a59e-3196-4192-975c-ccdc3dd5e6f7","apiVersion":"work.karmada.io/v1alpha1","resourceVersion":"6603066766"} reason="InterpretHealthFailed"

**Which issue(s)

is PR fixes**:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

`karmada-controller-manager`: The health status of resources without ResourceInterpreter customization will be treated as healthy by default.

@karmada-bot karmada-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 11, 2024
@codecov-commenter
Copy link

codecov-commenter commented Sep 11, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 52.94118% with 8 lines in your changes missing coverage. Please review.

Project coverage is 35.22%. Comparing base (62ae95e) to head (dbed854).

Files with missing lines Patch % Lines
pkg/controllers/status/work_status_controller.go 52.94% 5 Missing and 3 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5530      +/-   ##
==========================================
+ Coverage   35.21%   35.22%   +0.01%     
==========================================
  Files         645      645              
  Lines       44885    44891       +6     
==========================================
+ Hits        15806    15814       +8     
+ Misses      27847    27842       -5     
- Partials     1232     1235       +3     
Flag Coverage Δ
unittests 35.22% <52.94%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@whitewindmills
Copy link
Member

@a7i
we cannot stop interpreting health for resources without status. see #4453.
anyway, this is definitely worth optimizing, you might consider skipping it if InterpretHealth is not enabled.

if !c.ResourceInterpreter.HookEnabled(resourceKey.GroupVersionKind(), configv1alpha1.InterpreterOperationInterpretHealth) {
	// skip interpreting health ....
}

Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable.
LGTM

pkg/resourceinterpreter/default/native/default.go Outdated Show resolved Hide resolved
@XiShanYongYe-Chang
Copy link
Member

we cannot stop interpreting health for resources without status. see #4453.

Sorry, I don't understand. The current modification does not skip the health parsing step, but directly considers health as true when the object does not have status.

@whitewindmills
Copy link
Member

whitewindmills commented Sep 12, 2024

The current modification does not skip the health parsing step, but directly considers health as true when the object does not have status.

it's fake health. we should avoid modifying the default interpreter if there are no new features or bug fixes, and we should have called HookEnabled before calling InterpretHealth, which would have solved the problem as well, wouldn't it?

@whitewindmills
Copy link
Member

in addition, it seems that we have no regulations that do not interpret the health of resources without status. maybe there will be such resources in the future.

@XiShanYongYe-Chang
Copy link
Member

in addition, it seems that we have no regulations that do not interpret the health of resources without status.

That's the point. So logically, this pr modification is correct, but there is an alternative, which is to call the HookEnabled method, which I think is OK. How do you think @a7i ?

@chaosi-zju
Copy link
Member

chaosi-zju commented Sep 12, 2024

we cannot stop interpreting health for resources without status. see #4453.

it seems that we have no regulations that do not interpret the health of resources without status. maybe there will be such resources in the future.

I agree with what @whitewindmills said, it is quite right.

What problem #4453 resolved is, Service kind resource, it has InterpretHealth hook, but its InterpretHealth hook has no chance to run due to its status always reflected as nil by reflectStatus.

Service kind resource is exactly the resource who should interpret the health but without status. If this PR introduced, it will again has no change to run InterpretHealth since it always interpreted as Healthy.


updated:

I made a mistake, Service actually has status field, it is just status reflected as nil (so this PR doesn't affect it).
But that doesn't mean there won't be such resources in the future.
Resource without InterpretHealth hook enabled be treated as healthy is more reasonable.

@RainbowMango
Copy link
Member

Thanks @a7i for bringing this up, and thanks to @whitewindmills for the reminder that we have revisited the IntepretHealth design at #4453 (comment).

The agreement we made at that time was: The resources which do not implement InterpretHealth should be treated as healthy by default. I was hoping to make some improvement after #4453, but unfortunately I forgot it! (Maybe next time I will require the track PR before approving :))

Speaking of the improvements, I agree with @whitewindmills that we should skip evaluating those resources that don't implement InterpretHealth, treat them as healthy by default, and should avoid unexpected, surprising warnings.

@a7i a7i force-pushed the skip-interpret-health branch from c8f45f0 to 8e50532 Compare September 13, 2024 16:54
@a7i a7i changed the title skip interpret health of resources without .status skip interpret health of resources without a hook Sep 13, 2024
@a7i
Copy link
Contributor Author

a7i commented Sep 13, 2024

Thanks for the feedback. Pushed up a commit, let me know what you think!

@a7i a7i force-pushed the skip-interpret-health branch from 8e50532 to ba77642 Compare September 13, 2024 18:11
} else if healthy {
resourceHealth = workv1alpha1.ResourceHealthy
c.EventRecorder.Eventf(work, corev1.EventTypeNormal, events.EventReasonInterpretHealthSucceed, "Interpret health of object(%s/%s/%s) as healthy.", clusterObj.GetKind(), clusterObj.GetNamespace(), clusterObj.GetName())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, do we still need to report the event in the skip situation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, open to feedback. I personally would prefer if we didn't record the event

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No evidence shows we need it yet, we can skip the event for this PR.

@a7i
Copy link
Contributor Author

a7i commented Sep 17, 2024

Hi all, any feedback?

@whitewindmills
Copy link
Member

lgtm
/retest

@chaosi-zju
Copy link
Member

LGTM

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me!
I like the idea that have a dedicated methods for healthy assessment very much! Thank you!

} else if healthy {
resourceHealth = workv1alpha1.ResourceHealthy
c.EventRecorder.Eventf(work, corev1.EventTypeNormal, events.EventReasonInterpretHealthSucceed, "Interpret health of object(%s/%s/%s) as healthy.", clusterObj.GetKind(), clusterObj.GetNamespace(), clusterObj.GetName())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No evidence shows we need it yet, we can skip the event for this PR.

pkg/controllers/status/work_status_controller.go Outdated Show resolved Hide resolved
Signed-off-by: Amir Alavi <amiralavi7@gmail.com>
@a7i a7i force-pushed the skip-interpret-health branch from ba77642 to dbed854 Compare October 5, 2024 15:10
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 8, 2024
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 8, 2024
@karmada-bot karmada-bot merged commit f656d9a into karmada-io:master Oct 8, 2024
12 checks passed
@a7i a7i deleted the skip-interpret-health branch October 8, 2024 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants