Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable Kubernetes Client Throttling/Rate Limiting #1704

Closed
AaronFriel opened this issue Sep 8, 2021 · 0 comments · Fixed by #1748
Closed

Configurable Kubernetes Client Throttling/Rate Limiting #1704

AaronFriel opened this issue Sep 8, 2021 · 0 comments · Fixed by #1748
Labels
impact/performance Something is slower than expected kind/enhancement Improvements or new features resolution/fixed This issue was fixed

Comments

@AaronFriel
Copy link
Contributor

AaronFriel commented Sep 8, 2021

Affected area

All Pulumi operations over Kubernetes resources. This would affect the construction of clients from the Kubernetes client-go package.

Based on logs emitted from deploys, it appears that Pulumi may be using a default rate limiting configuration that is too conservative. In particular, it appears client-go supports server-side rate limit detection and retries, and Pulumi implements its own retry mechanism.

I tried to follow the client-go/dynamic source code to figure out where the rate limiter was specified, but Go is not a language I work with regularly. It does seem like there ought to be, somewhere, a place to swap the RateLimiter used.

I think the dynamic.NewForConfig call here:

client, err := dynamic.NewForConfig(clientConfig)
if err != nil {
return nil, fmt.Errorf("failed to initialize dynamic client: %v", err)
}

Pulumi may be able to supply a "QPS" and "Burst" higher than 5 and 10, respectively: https://github.com/kubernetes/client-go/blob/f6ce18ae578c8cca64d14ab9687824d9e1305a67/rest/config.go#L115-L121

An environment variable would be perfect for us to evaluate the change, such as PULUMI_KUBE_QPS and PULUMI_KUBE_BURST, but I imagine the Pulumi team has test clusters on major platforms to evaluate for better defaults.

A moderate increase could dramatically reduce deploy times while work is ongoing to reduce the number of API calls necessary by sharing informers / watches on resources a la #1639.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/performance Something is slower than expected kind/enhancement Improvements or new features resolution/fixed This issue was fixed
Projects
None yet
3 participants