-
Notifications
You must be signed in to change notification settings - Fork 924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discovery is throttled when there are lots of resources (CRDs) #1126
Comments
/triage accepted |
@justinsb commented on original bug: Two contributory things: I think there's a bug where kubectl get clusterrolebindings.rbac.authorization.k8s.io forces a cache invalidation and full rediscovery, but ``kubectl get clusterrolebindings` does not. The gotcha is that tab completion uses the full name, and of course you're more likely to use tab completion when you have lots of CRDs, making discovery more expensive. In general, we shouldn't have to do a full discovery to map GVK -> GVR. In the vast majority of cases I think we should be able to just do type discovery in the group and then find the K -> R. (Daniel Smith shared something about Openshift clusters having the G in GVK not equal to the G in GVR, but I would hope we could address that with a fallback). This would be great for controllers, but I don't think it helps for kubectl where we often need shortnames from the apiserver (and in general, kubectl is going to use more of the surface than a controller which only needs GVK -> GVR mapping, so a full discovery is more justified ... ideally we wouldn't invalidate it every time the user typed though!) |
Copied from the original bug: Two contributory things:
|
I just came here to raise this same issue. Over in @crossplane land we'd like to be able to install a lot of CRDs. You can think of Crossplane providers as controller managers that reconcile custom resources representing some set of "external" (to the API server where Crossplane is running) resources - e.g. SQL databases or GCP resources. The long tail of our providers are pretty small (<10 custom resource definitions), but the big ones currently correspond to entire cloud providers (similar to Google KCC) and are thus in the order of 700 custom resource definitions. Hypothetically if someone wanted to install the AWS, GCP, and Azure providers all at the same time that would be around 2,000 CRDs. We've considered breaking up our providers since it's unlikely folks are actually using even half of (e.g) all 700 AWS APIs but the only real impetus for doing so is avoiding performance bottlenecks. Alongside expensive OpenAPI processing (kubernetes/kubernetes#105932) lengthy discovery processes are the next most annoying consequence of having many CRDs; we're seeing discovery sometimes take minutes. |
As discussed in kubernetes/kubernetes#105520 we're disabling the discovery burst in kubectl for now |
Thanks @soltysh! Can you help me understand how this is likely to be released - is it reasonable to expect it to be backported to previous versions of |
@negz this will be part of 1.23 release, and as you're saying I wouldn't personally consider this a backport-able bug. |
@kbudde https://github.com/kubernetes/sig-release/blob/master/release-engineering/role-handbooks/patch-release-team.md#cherry-pick-requests documents the process. @justinsb is there another open issue for this, or is it just inside a comment in this closed issue? "I think there's a bug where kubectl get clusterrolebindings.rbac.authorization.k8s.io forces a cache invalidation and full rediscovery..." Personally I would hope that a full name to just fetch information for the single resource type and always be fast, even when the cache, unless it's possible for a shortname or category to also call itself clusterrolebindings.rbac.authorization.k8s.io. |
Bump discovery burst to 300 (hard coded): kubernetes/kubectl#1126 https://github.com/kubernetes/kubernetes/pull/105520/files Enlarge TTL of discovery cache to 6 hour: kubernetes#107130 https://github.com/kubernetes/kubernetes/pull/107141/files
same issue:
|
From k/k: kubernetes/kubernetes#105489
The default client side rate limit is 50qps with a burst of 100.
Kubectl does a discovery pass before doing the operation requested. Discovery involves (some combination of) reading the OpenAPI spec and the "homegrown" discovery information for the purpose of e.g. mapping kinds to resources. Sometimes these can be cached, but even if it is cached, an API call can be required to verify it hasn't changed.
This means that if there are more than 100 group versions, we will hit the client side rate limit in the process of discovery, possibly significantly slowing down the process.
AFAIK there is no way to adjust this limit when using kubectl.
A user report can be found here: kubernetes/kubernetes#101634 (comment)
An easy way to fix this is to disable the client side limit in kubectl. Possibly caching parameters could be adjusted, too. A difficult fix would be to load the minimum amount of discovery documents necessary to perform the task at hand (but some tasks require everything).
The text was updated successfully, but these errors were encountered: