-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lot's of discovery cache throttling with many CRDs #2084
Comments
Recently we faced not exact, but similar issues due to discovery is quite bursty in terms of network requests. I wonder, what do people think about revisiting the discovery process and make it lazy? Or, alternatively, provide some knobs for configuring specific groups for initial discovery and ignore groups that are not needed.
I understand this as a proposal to increase the number of requests per second by default for the discovery procedure, am I right? |
And that is an issue? So you are saying the original ask here (higher ratelimit for discovery) would be a problem for you?
There is an open issue for this #1603 but someone has to actually do the work. That didn't happen yet. Since we get the group version from the scheme I think it should be possible to do that but no one did yet. Maybe I am also missing something and it is not actually possible. IIRC there was also some work going on in upstream to provide a single discovery document rather than the current hierarchy which would also alleviate this issue. |
This is the upstream sig-apimachinery kep I was referring to: https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/3352-aggregated-discovery |
We faced controller crashes during discovery because of network issues. Presumably, in our case, this was caused by the LB configuration set up in front of the API server. I'm thinking about clusters on top of on-premish platforms such as vsphere, and use cases that suppose plenty of controller restarts (openshift upgrades :P ). Anyway, @alvaroaleman, many thanks for the info about the prior work! Lets move conversations about lazy discovery there into respective issue. |
Looking through the upstream codebase I think we actually don't need to change anything in the current code for It looks like we use
and the same goe for the Dynamic one
Which internally calls A check is made for |
When using
controller-runtime
on a cluster with many CRDs installed, I see many client-side throttling log lines such as these:This has been a larger issue in the Kubernetes community which also affected
kubectl
and therefore even more end users. It's arguably more annoying to get slowed down for everykubectl
command than just a slowed down controller that runs in the background. Inkubectl
it has been fixed via.WithDiscoveryBurst(300).WithDiscoveryQPS(50.0)
(source) and by extending the TTL of the discovery cache.However, controller-runtime uses the same config options for regular API calls and the discovery client. We should implement the same fix as in
kubectl
to allow for higher rate limits during discovery to speed up controller reconciliation.cc @negz
The text was updated successfully, but these errors were encountered: