-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Provide a truly lazy restmapper #2116
Conversation
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a worthwile improvement to the controller-runtime project, with large numbers of CRDs, we've seen rest mapping take a long time and cache large amounts of data that isn't really necessary.
I think there are a few improvements that could be made to make this a little cleaner, and one logical optimisation, see my comments and let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly agree with @JoelSpeed comments. Have just an extra nit about docstring for filteredServerGroupsAndResources
.
d06b559
to
2967f3b
Compare
7a302ea
to
cc59ac4
Compare
/hold cancel |
// LazyRESTMapper is a RESTMapper that will lazily query the provided | ||
// client for discovery information to do REST mappings. | ||
type LazyRESTMapper struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if we though actually about making this RESTMapper the default implementation, truthfully it sounds like that the benefit of making less calls, and only making them when needed are something that a lot of users might want.
If we decide to go down this path, we should however provide a feature gate of sorts that enables this as an experiment first (opt in) -> on by default -> then enable it and remove the gate.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Initially I didn't plan to make it default but rather to use it either for clusters with a slow API server, or for very crd-pumped ones (ahem, OpenShift), when we have to do more than a hundred requests to initialize the mapper.
But after multiple optimizations I could say there is no worsening relative to the current implementation, only improvements. So, it's reasonable to make it default after some testing.
Do you have an example of such feature gating in this project, so I can mark this feature as experimental?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, we test this implementation with OpenShift here: openshift/cluster-cloud-controller-manager-operator#213
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need to introduce a feature gate, which we currently don't have support for. For this PR, we can focus immediately to do some sort of split-logic between how things were done in the default RESTMapper, and this way.
If we feel comfortable though and have enough testing coverage all around, we can mark this a breaking change (
@alvaroaleman wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to simply add this restmapper for now and backport it into 0.14. 0.15 onwards (or later, depending on if we find issues) we can simply make it the default but leave the option to provide a custom one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have the implementation of this PR inlined with the current one, but all of it behind a feature gate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you featuregate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new DynamicRESTMapperOption
that sets the flag could work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like: WithExperimentalLazyMapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
Sorry, this is on my queue but I am a bit swamped right now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fedosin one comment to optimize this even further, other than that this looks good to me. Thanks a ton for your work and sorry it took so long.
// LazyRESTMapper is a RESTMapper that will lazily query the provided | ||
// client for discovery information to do REST mappings. | ||
type LazyRESTMapper struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging the two for next release sounds good to me.
I think the new lazy restmapper does everything the current dynamic one does btw. The current dynamic one will on a resource not found error reload the entire restmapping (with some ratelimiting). Since the new one will simply try to load the restmapping on-demand if it doesn't have it cached, that should solve the same problem just as well.
/cherrypick release-0.14 |
@alvaroaleman: once the present PR merges, I will cherry-pick it on top of release-0.14 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This commit adds a rest mapper that will lazily query the provided client for discovery information to do REST mappings.
Instead of creating the instance of the mapper directly, we will use WithExperimentalLazyMapper option for Dynamic Restmapper.
@alvaroaleman @vincepri hey folks! I updated this PR as you asked - now there is a "featuregate" called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks you so much!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman, Fedosin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@alvaroaleman: new pull request created: #2179 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
func (m *lazyRESTMapper) findAPIGroupByName(groupName string) (metav1.APIGroup, error) { | ||
// Ensure that required info about existing API groups is received and stored in the mapper. | ||
// It will make 2 API calls to /api and /apis, but only once. | ||
if m.apiGroups == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry very late :)
Q: We only ever seem to write m.apiGroups here. What if the apiGroups change after that? Or do we assume that's not relevant as the CRDs a controller uses must be deployed before the controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @Fedosin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on the question above, as far as I can see the apiGroups
is never refreshed which might cause issues for groups that were installed after the mapper gets created AND if someone calls RESTMapping without a version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Q: does it make sense to track this in an issue and makes sure this has an answer before pushing further the adoption of this mapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, the code should work for 97% of uses within controller runtime; but the above is definitely an issue we should fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference this has been addressed in #2208 and released in v0.14.5
provider-kubernetes initialized a new kubernetes client for each reconcile. The REST mapper in controller-runtime used to fetch information about every CRD in the cluster. controller-runtime introduced a lazy restmapper which means we don't have to introduce any complex caching to get a significant performance boost in provider-kubernetes: kubernetes-sigs/controller-runtime#2116 This seems to become the default in the next release: kubernetes-sigs/controller-runtime#2296 But this is so significant that we want to update now: * CPU reduced from constant throttling at 0.4 cores to 0.04 cores * CloudWatch / EKS audit log costs reduced significantly (55% for our cluster, with a lot of provider-kubernetes resources) Signed-off-by: Carl Henrik Lunde <chlunde@ifi.uio.no>
provider-kubernetes initialized a new kubernetes client for each reconcile. The REST mapper in controller-runtime used to fetch information about every CRD in the cluster. controller-runtime introduced a lazy restmapper which means we don't have to introduce any complex caching to get a significant performance boost in provider-kubernetes: kubernetes-sigs/controller-runtime#2116 This seems to become the default in the next release: kubernetes-sigs/controller-runtime#2296 But this is so significant that we want to update now: * CPU reduced from constant throttling at 0.4 cores to 0.04 cores * CloudWatch / EKS audit log costs reduced significantly (55% for our cluster, with a lot of provider-kubernetes resources) Signed-off-by: Carl Henrik Lunde <chlunde@ifi.uio.no>
provider-kubernetes initialized a new kubernetes client for each reconcile. The REST mapper in controller-runtime used to fetch information about every CRD in the cluster. controller-runtime introduced a lazy restmapper which means we don't have to introduce any complex caching to get a significant performance boost in provider-kubernetes: kubernetes-sigs/controller-runtime#2116 This seems to become the default in the next release: kubernetes-sigs/controller-runtime#2296 But this is so significant that we want to update now: * CPU reduced from constant throttling at 0.4 cores to 0.04 cores * CloudWatch / EKS audit log costs reduced significantly (55% for our cluster, with a lot of provider-kubernetes resources) Signed-off-by: Carl Henrik Lunde <chlunde@ifi.uio.no>
This PR adds a rest mapper that will lazily query the provided client for discovery information to do REST mappings. This allows you to significantly reduce the number of requests to the API server, as currently we do a request for each registered GroupVersion at startup. It also makes the startup time of the controller more predictable and independent of the number of CRDs installed in the system.
Fixes: #1603