🐛 make clusterctl REST client throttling configurable #8411

nojnhuh · 2023-03-28T18:55:08Z

What this PR does / why we need it:

When running commands like clusterctl init on a cluster with many CRDs installed, the API discovery process frequently gets throttled client-side causing operations to take significantly longer. On a fresh kind cluster, I was seeing clusterctl init take about 30s, but when ~100 CRDs are installed it took 5-6m. In this blog post from Upbound (maintainers of Crossplane), they mention they ran into the same problem with kubectl and contributed a fix upstream to bump the rest.Config's Burst from 100 to 300. With the same change applied here, I see the same clusterctl init command go from taking 5-6m to under 1m with minimal log messages indicating client-side throttling is being applied.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

killianmuldoon

This seems like a good idea to me - one question about the use case though - as long as you don't mind sharing 🙂

Is there a reason there's so many additional CRDs installed before clusterctl init?

nojnhuh · 2023-03-28T19:27:12Z

This seems like a good idea to me - one question about the use case though - as long as you don't mind sharing 🙂

Is there a reason there's so many additional CRDs installed before clusterctl init?

Sure thing. In CAPZ we're considering adopting Azure Service Operator (ASO) to create/update/delete individual resources in Azure (CAPZ proposal) and ASO defines one CRD per Azure resource type. And right now there's not a great way to install only the ASO CRDs we need.

killianmuldoon · 2023-03-28T19:37:16Z

Thanks for explaining - definitely seems reasonable. The only question is whether to default the burst higher or make it configurable with a flag / config file. I think clusterctl maintainers and reviewers will have clearer opinions on that.

cmd/clusterctl/client/cluster/proxy.go

CecileRobertMichon · 2023-03-29T20:14:04Z

Should this be classified as a bug fix and considered for cherry-pick?

CecileRobertMichon · 2023-03-29T20:15:29Z

The only question is whether to default the burst higher or make it configurable with a flag / config file. I think clusterctl maintainers and reviewers will have clearer opinions on that.

we could do both? I think a higher default is sane, given a user installing several infra providers could quickly run into these limits (and that the current value is lower than the kubectl default kubernetes/kubernetes#105520)

killianmuldoon · 2023-03-30T08:59:40Z

Should this be classified as a bug fix and considered for cherry-pick?

I think it's a good idea to cherry-pick for sure. @nojnhuh could you add a command line flag making this configurable?

nojnhuh · 2023-03-30T17:43:48Z

I think it's a good idea to cherry-pick for sure. @nojnhuh could you add a command line flag making this configurable?

Yep, will do. Should I do the same for the rest config's QPS since that's closely related?

killianmuldoon · 2023-03-30T17:45:18Z

Yep, will do. Should I do the same for the rest config's QPS since that's closely related?

Sounds good to me.

nojnhuh · 2023-03-30T23:01:23Z

Had to add most of the plumbing to expose these as CLI flags which kinda blew up this PR. Definitely took a hit to my keyboard's Cmd, C, and V keys but it works at least.

/retitle 🐛 make clusterctl REST client throttling configurable

nojnhuh · 2023-04-03T20:08:27Z

@killianmuldoon Thanks for taking a look so far! Could you help assign folks you think might be good to review this?

CecileRobertMichon

/cc @Jont828

CecileRobertMichon

lgtm

Is there anywhere in the book these options should be documented?

nojnhuh · 2023-04-04T20:25:04Z

I don't see anything like generated documentation for every flag on every command, so maybe the built-in --help is enough here?

ykakarap

Increasing the defaults to match upstream (kubectl) sounds reasonable.

The overall approach looks good. I will test it later this week.

I don't see anything like generated documentation for every flag on every command, so maybe the built-in --help is enough here?

Correct. We do not have any explicit documentation for the flags of all the commands. The flag descriptions included in the built-in --help are enough.

nojnhuh · 2023-04-11T15:58:03Z

/assign @ykakarap

Friendly ping on this if you wouldn't mind taking a look.

cmd/clusterctl/cmd/delete.go

cmd/clusterctl/cmd/move.go

cmd/clusterctl/cmd/topology_plan.go

fabriziopandini · 2023-04-14T16:10:56Z

/lgtm
Happy to approve as soon as @ykakarap of @Jont828 of other maintainers/reviewers in this area give a +1

nojnhuh · 2023-04-19T19:41:39Z

Test flake already has an issue open: #8478

sbueringer · 2023-04-22T06:39:47Z

I wonder if an alternative to the current implementation in this PR is to use the lazy rest mapper that CR introduced as an experimental feature (and we exposed as an experimental feature on our controllers).

The lazy restmapper only discovers CRDs on-demand when they are actually used. This way we could just reduce the amount of requests instead of changing the throttling.

xref:

nojnhuh · 2023-05-01T03:52:00Z

@sbueringer Do you think it might make sense to integrate the lazy rest mapper separately so users can take advantage of the increased throttling thresholds without having to opt in to experimental functionality?

sbueringer · 2023-05-02T12:20:29Z

I'm fine with it but I would like to leave the decision to clusterctl maintainers.

I just thought it's better to make it unnecessary to make throttling configurable by optimizing the calls. While lazy restmapper is experimental we use it since ~ 1-2 months in all our e2e test jobs (1.4 + main IIRC) without any problems.

Also looks like it would help us to avoid having to make changes in a lot of places to make it configurable.

It feels like something that we shouldn't require to have configurable based on what clusterctl is doing (deploying providers) to be honest

cmd/clusterctl/cmd/delete.go

sbueringer · 2023-05-02T12:27:16Z

^^ @vincepri WDYT?

k8s-ci-robot · 2023-05-02T14:36:43Z

New changes are detected. LGTM label has been removed.

sbueringer · 2023-05-02T16:46:10Z

Thought experiment:

Let's assume the lazy restmapper becomes the default/only restmapper in CR v0.15 (to which we will bump in 2 weeks). Would we still want the flags additionally?

nojnhuh · 2023-05-02T18:35:07Z

@ykakarap @Jont828 Thoughts on using the lazy restmapper instead of or in addition to these changes?

sbueringer · 2023-05-02T18:39:56Z

Context: I'm asking because we're about to make the lazy restmapper the default restmapper: kubernetes-sigs/controller-runtime#2296

fabriziopandini · 2023-05-04T17:02:02Z

Given that it getting the default in two weeks, the lazy rest mapper makes more and more sense...

sbueringer · 2023-05-04T17:12:10Z

We have it as soon as #8007 is merged, so we could try how it changes the situation for clusterctl

nojnhuh · 2023-05-09T18:39:07Z

@sbueringer Do you expect that as soon as controller-runtime is bumped, then CAPI will automatically start using the lazy restmapper and this issue I was originally running into might go away entirely? Or will there be additional work to take advantage of the lazy restmapper?

ykakarap · 2023-05-09T20:04:24Z

@sbueringer Do you expect that as soon as controller-runtime is bumped, then CAPI will automatically start using the lazy restmapper and this issue I was originally running into might go away entirely? Or will there be additional work to take advantage of the lazy restmapper?

Looks like after bumping controller-runtime (#8007) clusterctl client should default to using the lazy rest mapper. It would be better to perform some tests (similar to how you got the initial metrics of 5-6m ) to see if it addresses the need.
That combined with increasing the defaults to match upstream values should be a good first place to start.
Are there any strong needs to make them configurable?

nojnhuh · 2023-05-09T21:05:25Z

Thanks @ykakarap, I'll retest once the controller-runtime version bump merges. AFACT kubectl doesn't make these values configurable, so I think we can reasonably leave the values hardcoded to what kubectl has in clusterctl at least for the short- to medium-term. If increasing the default burst does continue to make things significantly faster with the lazy restmapper, I'd be in favor of making that change by itself and making the values configurable if need be in a separate PR. I'll follow up here once I've done that testing with controller-runtime 0.15.

ykakarap · 2023-05-09T22:28:57Z

Sounds good. Thank you :)

sbueringer · 2023-05-10T05:58:28Z

Sounds good. As of controller-runtime v0.15 the lazy restmapper will be the only restmapper so it will be used out of the box

k8s-ci-robot · 2023-05-17T17:36:28Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sbueringer · 2023-05-25T06:24:32Z

@nojnhuh Just fyi, CR bump has been merged now if you want to test again :)

nojnhuh · 2023-05-25T19:04:07Z

@nojnhuh Just fyi, CR bump has been merged now if you want to test again :)

Just tried this out and it's definitely lots better than before! clusterctl init still seems a few seconds slower overall when lots of CRDs are installed but I think that's reasonable in our case. Thanks @sbueringer for the CR bump!

/close

k8s-ci-robot · 2023-05-25T19:04:11Z

@nojnhuh: Closed this PR.

In response to this:

@nojnhuh Just fyi, CR bump has been merged now if you want to test again :)

Just tried this out and it's definitely lots better than before! clusterctl init still seems a few seconds slower overall when lots of CRDs are installed but I think that's reasonable in our case. Thanks @sbueringer for the CR bump!

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot requested a review from jackfrancis March 28, 2023 18:55

k8s-ci-robot added the area/clusterctl Issues or PRs related to clusterctl label Mar 28, 2023

k8s-ci-robot requested a review from stmcginnis March 28, 2023 18:55

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 28, 2023

killianmuldoon reviewed Mar 28, 2023

View reviewed changes

nojnhuh force-pushed the throttle branch from 0d5dd07 to 2ebb6ea Compare March 28, 2023 19:27

killianmuldoon reviewed Mar 28, 2023

View reviewed changes

cmd/clusterctl/client/cluster/proxy.go Outdated Show resolved Hide resolved

k8s-ci-robot changed the title ~~🌱 increase clusterctl client throttle burst~~ 🐛 make clusterctl REST client throttling configurable Mar 30, 2023

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 30, 2023

CecileRobertMichon reviewed Apr 4, 2023

View reviewed changes

k8s-ci-robot requested a review from Jont828 April 4, 2023 17:39

CecileRobertMichon reviewed Apr 4, 2023

View reviewed changes

ykakarap reviewed Apr 5, 2023

View reviewed changes

k8s-ci-robot assigned ykakarap Apr 11, 2023

fabriziopandini reviewed Apr 12, 2023

View reviewed changes

k8s-ci-robot assigned fabriziopandini Apr 14, 2023

killianmuldoon mentioned this pull request Apr 27, 2023

✨ Add flags for configuring rate limits #8579

Merged

sbueringer reviewed May 2, 2023

View reviewed changes

cmd/clusterctl/cmd/delete.go Outdated Show resolved Hide resolved

🌱 increase clusterctl client throttle burst

8e9ce8e

nojnhuh force-pushed the throttle branch from f0418ed to 8e9ce8e Compare May 2, 2023 14:36

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 2, 2023

k8s-ci-robot requested review from fabriziopandini and ykakarap May 2, 2023 14:36

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 17, 2023

k8s-ci-robot closed this May 25, 2023

nojnhuh deleted the throttle branch May 25, 2023 19:04

🐛 make clusterctl REST client throttling configurable #8411

🐛 make clusterctl REST client throttling configurable #8411

Conversation

nojnhuh commented Mar 28, 2023

killianmuldoon left a comment

Choose a reason for hiding this comment

nojnhuh commented Mar 28, 2023

killianmuldoon commented Mar 28, 2023

CecileRobertMichon commented Mar 29, 2023

CecileRobertMichon commented Mar 29, 2023 • edited Loading

killianmuldoon commented Mar 30, 2023

nojnhuh commented Mar 30, 2023

killianmuldoon commented Mar 30, 2023 • edited Loading

nojnhuh commented Mar 30, 2023

nojnhuh commented Apr 3, 2023

CecileRobertMichon left a comment

Choose a reason for hiding this comment

CecileRobertMichon left a comment

Choose a reason for hiding this comment

nojnhuh commented Apr 4, 2023

ykakarap left a comment

Choose a reason for hiding this comment

nojnhuh commented Apr 11, 2023

fabriziopandini commented Apr 14, 2023

nojnhuh commented Apr 19, 2023

sbueringer commented Apr 22, 2023 • edited Loading

nojnhuh commented May 1, 2023

sbueringer commented May 2, 2023 • edited Loading

sbueringer commented May 2, 2023

k8s-ci-robot commented May 2, 2023

sbueringer commented May 2, 2023 • edited Loading

nojnhuh commented May 2, 2023

sbueringer commented May 2, 2023

fabriziopandini commented May 4, 2023

sbueringer commented May 4, 2023

nojnhuh commented May 9, 2023

ykakarap commented May 9, 2023

nojnhuh commented May 9, 2023

ykakarap commented May 9, 2023

sbueringer commented May 10, 2023

k8s-ci-robot commented May 17, 2023

sbueringer commented May 25, 2023

nojnhuh commented May 25, 2023

k8s-ci-robot commented May 25, 2023

CecileRobertMichon commented Mar 29, 2023 •

edited

Loading

killianmuldoon commented Mar 30, 2023 •

edited

Loading

sbueringer commented Apr 22, 2023 •

edited

Loading

sbueringer commented May 2, 2023 •

edited

Loading

sbueringer commented May 2, 2023 •

edited

Loading