feat: Add webhooks client config service metrics #2114
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
In order to better identify, classify, and debug webhook latency issues, it is important to have a metric that would point to the resource it is responsible for. However, it is not possible to have that dimension in the metrics exposed by Kubernetes because of the unbound cardinality that such a label would have.
The name of the webhook could be an alternative since it usually contains some information about the resource that the webhook targets, however this is not very practical to use in multi-tenants environments.
A solution for these kind of platform is to tie a specific webhook to a namespace in order to be able to know which tenant manages it and take actions depending on that. This is achievable by leveraging the client config information of webhooks configured via WebhookConfiguration resources since Services are namespaced objects.
With these new metrics, users will be able to split the alerting severity of webhook latency / rejection rate per namespace on top of being able to do it based on the webhook name. This is key in environment where administrators don't have control over the webhooks installed by the various tenants.
How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)
O(webhooks)