Discovery is throttled when there are lots of resources (CRDs) #1126

seans3 · 2021-10-05T17:49:14Z

The default client side rate limit is 50qps with a burst of 100.

Kubectl does a discovery pass before doing the operation requested. Discovery involves (some combination of) reading the OpenAPI spec and the "homegrown" discovery information for the purpose of e.g. mapping kinds to resources. Sometimes these can be cached, but even if it is cached, an API call can be required to verify it hasn't changed.

This means that if there are more than 100 group versions, we will hit the client side rate limit in the process of discovery, possibly significantly slowing down the process.

AFAIK there is no way to adjust this limit when using kubectl.

A user report can be found here: kubernetes/kubernetes#101634 (comment)

An easy way to fix this is to disable the client side limit in kubectl. Possibly caching parameters could be adjusted, too. A difficult fix would be to load the minimum amount of discovery documents necessary to perform the task at hand (but some tasks require everything).

seans3 · 2021-10-05T17:49:55Z

/assign @lavalamp
/assign @justinsb
/assign

seans3 · 2021-10-05T17:50:11Z

/triage accepted

seans3 · 2021-10-05T17:51:15Z

@justinsb commented on original bug:

Two contributory things:

I think there's a bug where kubectl get clusterrolebindings.rbac.authorization.k8s.io forces a cache invalidation and full rediscovery, but ``kubectl get clusterrolebindings` does not. The gotcha is that tab completion uses the full name, and of course you're more likely to use tab completion when you have lots of CRDs, making discovery more expensive.

In general, we shouldn't have to do a full discovery to map GVK -> GVR. In the vast majority of cases I think we should be able to just do type discovery in the group and then find the K -> R. (Daniel Smith shared something about Openshift clusters having the G in GVK not equal to the G in GVR, but I would hope we could address that with a fallback). This would be great for controllers, but I don't think it helps for kubectl where we often need shortnames from the apiserver (and in general, kubectl is going to use more of the surface than a controller which only needs GVK -> GVR mapping, so a full discovery is more justified ... ideally we wouldn't invalidate it every time the user typed though!)

justinsb · 2021-10-05T17:51:58Z

Copied from the original bug:

Two contributory things:

I think there's a bug where kubectl get clusterrolebindings.rbac.authorization.k8s.io forces a cache invalidation and full rediscovery, but kubectl get clusterrolebindings does not. The gotcha is that tab completion uses the full name, and of course you're more likely to use tab completion when you have lots of CRDs, making discovery more expensive.
In general, we shouldn't have to do a full discovery to map GVK -> GVR. In the vast majority of cases I think we should be able to just do type discovery in the group and then find the K -> R. (@lavalamp shared something about Openshift clusters having the G in GVK not equal to the G in GVR, but I would hope we could address that with a fallback). This would be great for controllers, but I don't think it helps for kubectl where we often need shortnames from the apiserver (and in general, kubectl is going to use more of the surface than a controller which only needs GVK -> GVR mapping, so a full discovery is more justified ... ideally we wouldn't invalidate it every time the user typed <tab> though!)

negz · 2021-10-28T23:58:44Z

I just came here to raise this same issue. Over in @crossplane land we'd like to be able to install a lot of CRDs. You can think of Crossplane providers as controller managers that reconcile custom resources representing some set of "external" (to the API server where Crossplane is running) resources - e.g. SQL databases or GCP resources. The long tail of our providers are pretty small (<10 custom resource definitions), but the big ones currently correspond to entire cloud providers (similar to Google KCC) and are thus in the order of 700 custom resource definitions. Hypothetically if someone wanted to install the AWS, GCP, and Azure providers all at the same time that would be around 2,000 CRDs.

We've considered breaking up our providers since it's unlikely folks are actually using even half of (e.g) all 700 AWS APIs but the only real impetus for doing so is avoiding performance bottlenecks. Alongside expensive OpenAPI processing (kubernetes/kubernetes#105932) lengthy discovery processes are the next most annoying consequence of having many CRDs; we're seeing discovery sometimes take minutes.

soltysh · 2021-10-29T14:10:58Z

As discussed in kubernetes/kubernetes#105520 we're disabling the discovery burst in kubectl for now

negz · 2021-10-29T20:37:21Z

Thanks @soltysh! Can you help me understand how this is likely to be released - is it reasonable to expect it to be backported to previous versions of kubectl? It's not intuitive to me because on one hand it's a kind of a bug fix, while on the other hand it's quite a big behaviour change (and kind of depends on APF).

soltysh · 2021-11-02T15:13:02Z

@negz this will be part of 1.23 release, and as you're saying I wouldn't personally consider this a backport-able bug.

kbudde · 2022-01-14T06:47:44Z

@negz , @soltysh as far as I can see the limit should have been upgraded to 300 (#105520) but the fix doesn't work for 1.23 (see ( #107131 ). Now it's part of release 1.24.
What is the process to get this change backported into older version?

chlunde · 2022-01-24T07:22:16Z

@kbudde https://github.com/kubernetes/sig-release/blob/master/release-engineering/role-handbooks/patch-release-team.md#cherry-pick-requests documents the process.

@justinsb is there another open issue for this, or is it just inside a comment in this closed issue? "I think there's a bug where kubectl get clusterrolebindings.rbac.authorization.k8s.io forces a cache invalidation and full rediscovery..." Personally I would hope that a full name to just fetch information for the single resource type and always be fast, even when the cache, unless it's possible for a shortname or category to also call itself clusterrolebindings.rbac.authorization.k8s.io.

Bump discovery burst to 300 (hard coded): kubernetes/kubectl#1126 https://github.com/kubernetes/kubernetes/pull/105520/files Enlarge TTL of discovery cache to 6 hour: kubernetes#107130 https://github.com/kubernetes/kubernetes/pull/107141/files

kong62 · 2023-05-08T02:30:39Z

same issue:

# kubectl drain --ignore-daemonsets --delete-emptydir-data --delete-local-data --grace-period=80 10.44.5.158
I0306 20:48:56.890979   12500 request.go:665] Waited for 1.126157702s due to client-side throttling, not priority and fairness, request: GET:https://10.44.6.13:60002/api/v1/namespaces/prod/pods/caesar-remotequery-msv-20220803145002-b08f661-784bbbd85b-6nnvw
I0306 20:49:06.891933   12500 request.go:665] Waited for 1.135381322s due to client-side throttling, not priority and fairness, request: GET:https://10.44.6.13:60002/api/v1/namespaces/prod/pods/basketball-pc-web-20230303181501-7cc54b4-6d65996dd6-225n9

seans3 added the kind/bug Categorizes issue or PR as related to a bug. label Oct 5, 2021

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 5, 2021

k8s-ci-robot assigned justinsb, lavalamp and seans3 Oct 5, 2021

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 5, 2021

seans3 mentioned this issue Oct 5, 2021

kubectl: discovery is throttled when there are lots of resources (CRDs) kubernetes/kubernetes#105489

Closed

soltysh mentioned this issue Oct 6, 2021

Bump discovery burst for kubectl to 300 kubernetes/kubernetes#105520

Merged

anencore94 mentioned this issue Oct 28, 2021

kubectl get <something> command return - Throttling request log kubernetes/client-go#1029

Closed

negz mentioned this issue Oct 28, 2021

API Server (and clients) becomes unresponsive with too many CRDs crossplane/crossplane#2649

Closed

negz mentioned this issue Oct 30, 2021

Don't rate limit kubectl API discovery; defer to APF kubernetes/kubernetes#106016

Closed

k8s-ci-robot closed this as completed in kubernetes/kubernetes#105520 Nov 17, 2021

mvaal mentioned this issue Jan 30, 2022

Add ability to disable certain CRDs on installation. crossplane/crossplane#2869

Closed

2 tasks

chobostar mentioned this issue Jul 11, 2022

Support discovery cache or up both QPS/BurstQPS tektoncd/cli#1641

Closed

jashandeep-sohi mentioned this issue Jul 14, 2022

Adjust Discovery throttling kptdev/kpt#3368

Merged

chobostar mentioned this issue Feb 14, 2023

Bug: Client-side throttling openshift/oadp-operator#902

Closed

1 task

chobostar mentioned this issue Mar 14, 2023

Client-side throttling, pod restarts - need to update client-go gatekeeper/gatekeeper-operator#240

Closed

This was referenced Aug 17, 2023

refactor: expose default kubectl config kubernetes/kubernetes#120024

Closed

fix(resource): match discovery burst and qps for kubectl with upstream kubectl binary argoproj/argo-workflows#11603

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discovery is throttled when there are lots of resources (CRDs) #1126

Discovery is throttled when there are lots of resources (CRDs) #1126

seans3 commented Oct 5, 2021 •

edited

Loading

seans3 commented Oct 5, 2021

seans3 commented Oct 5, 2021

seans3 commented Oct 5, 2021

justinsb commented Oct 5, 2021 •

edited

Loading

negz commented Oct 28, 2021 •

edited

Loading

soltysh commented Oct 29, 2021

negz commented Oct 29, 2021

soltysh commented Nov 2, 2021

kbudde commented Jan 14, 2022

chlunde commented Jan 24, 2022

kong62 commented May 8, 2023

Discovery is throttled when there are lots of resources (CRDs) #1126

Discovery is throttled when there are lots of resources (CRDs) #1126

Comments

seans3 commented Oct 5, 2021 • edited Loading

seans3 commented Oct 5, 2021

seans3 commented Oct 5, 2021

seans3 commented Oct 5, 2021

justinsb commented Oct 5, 2021 • edited Loading

negz commented Oct 28, 2021 • edited Loading

soltysh commented Oct 29, 2021

negz commented Oct 29, 2021

soltysh commented Nov 2, 2021

kbudde commented Jan 14, 2022

chlunde commented Jan 24, 2022

kong62 commented May 8, 2023

seans3 commented Oct 5, 2021 •

edited

Loading

justinsb commented Oct 5, 2021 •

edited

Loading

negz commented Oct 28, 2021 •

edited

Loading