Disable client-side rate-limiting when AP&F is enabled #111880

negz · 2022-08-17T04:03:38Z

What would you like to be added?

I'd like client-side rate limiting to be disabled in client-go when API server-side Priority and Fairness is enabled.

All client-go based Kubernetes clients are configured by default to use a token bucket rate limiter in an attempt to avoid overloading the API server. This is a form of open-loop control, in that the clients don't factor in the load of the API server in deciding whether to rate-limit themselves.

Per the below links the default rate limits for a particular client is currently 5 qps, bursting to 10 qps. Discovery clients are particularly noisy and have thus had their burst raised most recently to 300 qps. I believe this number was picked because it's around the number of API groups that need to be discovered (i.e. HTTP request that need to be made in short order) when @crossplane is installed with all of the "big three" providers (i.e. CRDs representing all AWS, Azure, and GCP APIs).

https://github.com/kubernetes/client-go/blob/a890e7b/rest/config.go#L44 - Typical clients
https://github.com/kubernetes/cli-runtime/blob/b48c51e/pkg/genericclioptions/config_flags.go#L452 - Discovery clients

API Priority and Fairness has been enabled by default (as a beta feature) since Kubernetes v1.20. It allows the API server to prioritize and queue requests, and can let a particular client know when it should limit its request rate by returning a response with HTTP status code 429 "Too Many Requests". REST clients appear to respect this status code and will backoff their requests when they encounter it per https://github.com/kubernetes/client-go/blob/a890e7b/rest/urlbackoff.go.

Disabling client-side rate-limiting appears to be a goal of AP&F per its graduation criteria:

PF allows us to disable client-side rate limiting without causing the apiservers to wedge/crash. Note that there is another level of concern that APF does not attempt to address, which is mismatch between the throughput that various controllers can sustain.

There seems to be a general consensus amongst API machinery maintainers that we should just stop worrying and learn to love AP&F per #105520 (comment).

Why is this needed?

The open-loop nature of client-go's rate limiter means that it's possible (indeed likely) that clients will rate limit themselves too much or not enough. The former is particularly painful for CLI tools like kubectl, helm, and kpt where a series of requests taking longer than they need to directly results in a particular CLI command taking longer than it needs to. Again this is an area where discovery is particularly painful - when kubectl's discovery burst was set to 100 qps we were seeing it take more than 5 minutes for some commands to complete, with users seeing the below logs while waiting for their command to finish.

Waited for 1.033772408s due to client-side throttling, not priority and fairness, request: GET:https://api.example.org/apis/pkg.crossplane.io/v1?timeout=32s

To this end several tools have bumped or disabled their client-side rate limits, e.g.:

Notably I'm pretty sure @crossplane will be exceeding the 300 qps discovery burst fairly soon, which will result in another round of PRs to bump the limit further if we don't remove it entirely.

The text was updated successfully, but these errors were encountered:

negz · 2022-08-17T04:04:14Z

/sig api-machinery

negz · 2022-08-17T04:04:35Z

/cc @apelisse @MikeSpreitzer

negz · 2022-08-17T04:18:58Z

/kind cleanup

Not sure if CI will let me make this change, but it feels more appropriate than "feature".

MadhavJivrajani · 2022-08-17T07:07:32Z

Related: #109614
/cc @wojtek-t @tkashem

MikeSpreitzer · 2022-08-17T15:14:03Z

This raises some questions.

Since enabling/disabling APF is done independently at each server, what is the desired behavior when the servers are not all the same in this regard?
How does the client know whether the servers have APF enabled or disabled?

MikeSpreitzer · 2022-08-17T15:27:10Z

Here are some suggested answers.

For a given request, the client reading the response can tell whether the server has APF enabled by looking for the X-Kubernetes-Pf-Flowschema-Uid and X-Kubernetes-Pf-Prioritylevel-Uid headers.
A client that has not gotten any response (yet, or in the last hour) assumes that the servers do not have APF enabled and applies client-side rate limiting.
If some responses have been received in the last hour and all of those indicate that the server has APF enabled, then the client stops applying client-side rate-limiting.
If some responses have been received in the last hour and any of those indicates that APF is not enabled, then the client applies client-side rate-limiting.

negz · 2022-08-17T20:21:43Z

How does the client know whether the servers have APF enabled or disabled?

There seems to be some prior art for this in kubernetes-sigs/cli-utils#584 (CC @karlkfi)

MikeSpreitzer · 2022-08-17T20:45:42Z

The technique in kubernetes-sigs/cli-utils#584 :

Looks at a response header like I suggested.
Adds an extra request per client.
Assumes that all the servers have the same configuration and it does not change over time.

karlkfi · 2022-08-17T21:01:49Z

FWIW, the impl I wrote seems to work pretty well in kpt when used by humans, but when using something similar in Config Sync (a multi-instance GitOps operator) we've been noticing that the load produced is very high, especially with many instances in parallel, and the default P&F config doesn't seem configured to be able to mitigate the load, forcing users to make their own P&F configs, which they generally don't know how to do. So I'm not sure this approach or P&F are actually mature enough or easy enough to use to propagate this all K8s clients. You might end up with a lot of broken clusters.

That said, I'd love for someone to pursue it and report issues to help improve P&F with this kind of dogfooding.

negz · 2022-08-17T21:33:56Z

I'd love for someone to pursue it and report issues to help improve P&F with this kind of dogfooding.

Yeah, that's my underlying motive here - to discover what needs to be done to feel comfortable removing these rate limits.

negz · 2022-08-17T22:29:16Z

we've been noticing that the load produced is very high, especially with many instances in parallel, and the default P&F config doesn't seem configured to be able to mitigate the load, forcing users to make their own P&F configs, which they generally don't know how to do.

@karlkfi Can you elaborate on what you see when the load gets high? Just reading through the docs I'm guessing all Config Sync instances are getting put into the workload-low priority level. Is the issue that this priority level isn't granular enough and that other instances at the same level are experiencing issues (time spent in queues? 429s?)? Are you seeing stuff in workload-low affecting other (lower?) priority levels?

karlkfi · 2022-08-17T22:42:21Z

Unfortunately, the problem is only partially diagnosed and a customer issue, so I can't share the exact details. I haven't had a chance to reproduce it myself yet, due to a number of other related load causing issues. But part of the issue is with thousands of clients all hammering the apiserver with the same P&F config, due to the multi-instance way Config Sync handles namespace tenancy.

MikeSpreitzer · 2022-08-18T15:57:12Z

If you are finding APF is not working well then that is very interesting indeed. Let me see if I understand the bad symptom. It is the apiserver(s) using a lot of CPU?

Thousands of the same sort of client is normal in a large cluster. The self-protection is based on the number of requests that the server actually works on at once, with the rest queued or rejected, so simply increasing the number of some sort of client should not defeat that.

What do the offending requests do? Any insight into the pathology, or even how to reproduce it, would be great.

leilajal · 2022-08-18T16:44:14Z

/triage accepted

k8s-triage-robot · 2022-11-16T16:47:11Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

negz · 2022-11-16T18:05:13Z

/remove-lifecycle stale

Remove RateLimiting options - replay on APF for apiserver protection. Details: kubernetes/kubernetes#111880

taliastocks · 2023-12-21T14:37:40Z

Is this change still planned? It doesn't look like there has been movement in over a year.

Jefftree · 2023-12-27T22:10:51Z

@negz: Aggregated Discovery (kubernetes/enhancements#3352) has been in beta since 1.27 and effectively reduces the number of discovery requests from an unbounded amount to 2. Is this still an issue anymore for crossplane?

karlkfi · 2023-12-28T00:36:18Z

For what it's worth, we implemented this a year and a half ago in Config Sync v1.12.0 and it's seems to work fine. It detects server-side throttling and disables client-side throttling.

Other than one initial customer complaint that seemed to be caused by a severely overloaded and underpowered apiserver, I haven't had any more customer complaints.

One notable downside tho is that it punts configuration of the server-side throttling config to the cluster admin, and most have no experience configuring it. The defaults are probably ok for many, but we haven't done the extensive testing required to figure out what edge cases require configuration. This shifts some responsibility from tenant to admin in multi-tenant use cases.

Another minor downside is that the detection requires making an api call, which can fail, needs its own recovery mechanism, and has to be run before other calls, which complicates app startup. Delaying detection is also challenging due to the way config-runtime and client-go have different ways of constructing clients from config. So there might still be some value to having this detection built-in.

https://github.com/GoogleContainerTools/kpt-config-sync/blob/1cf82eaa4c27016c38fa9ed3f4316aa2d8b52074/pkg/client/restconfig/restconfig.go#L102

negz · 2024-01-02T19:04:56Z

Is this still an issue anymore for crossplane?

It's not currently a big issue for Crossplane, but I'd still advocate for doing it.

bboreham · 2024-01-19T18:06:58Z

currently 5 qps, bursting to 10 qps.

Nitpick: the unit for burst is "q", not per second.

Since requests are sent one at a time, this limit only comes into effect after an idle period. E.g. with 5qps rate limit, if you send no requests for 2 seconds then the burst lets you send 10 as fast as you can before limiting again.

The default value is too small, and given kubernetes/kubernetes#111880, it is not really needed.

* clusterapi: Add 'watch' verb to scale-from-zero example If the 'get' and 'list' verbs are present, but the 'watch' verb is absent, the autoscaler reports an error. For example: cluster-autoscaler-b8949d8b9-76vcd E1006 22:11:43.056176 1 reflector.go:148] k8s.io/client-go/dynamic/dynamicinformer/informer.go:108: Failed to watch infrastructure.cluster.x-k8s.io/v1beta2, Resource=vcdmachinetemplates: unknown * Update with make generate * Add pdb filtering to remainingPdbTracker * Convert replicated, system, not-safe-to-evict, and local storage pods to drainability rules * Convert scale-down pdb check to drainability rule * Pass DeleteOptions once during default rule creation * Split out custom controller and common checks into separate drainability rules * Filter out disabled drainability rules during creation * Refactor GetPodsForDeletion logic and tests into simulator * Fix custom controller drainability rule and add test coverage * Add unit test for long-terminating pod past grace period * Removed node drainer, kept node termination handler * Add HasNodeGroupStartedScaleUp to cluster state registry. - HasNodeGroupStartedScaleUp checks wheter a scale up request exists without checking any upcoming nodes. * Add kwiesmueller to OWNERS jbartosik et al are transitioning off of workload autoscalers (incl vpa and addon-resizer). kwiesmueller is on the new team and has agreed to take on reviewer/approver responsibilities. * Add information about provisioning-class-name annotation. * Remove redundant if branch * Add mechanism to override drainability status * Log drainability override * fix(cluster-autoscaler-chart): if secretKeyRefNameOverride is true, don't create secret Signed-off-by: Jonathan Raymond <jonwraymond@gmail.com> * fix: correct version bump Signed-off-by: Jonathan Raymond <jonwraymond@gmail.com> * Initialize default drainability rules * feat: each node pool can now have different init configs * ClusterAPI: Allow overriding the kubernetes.io/arch label set by the scale from zero method via environment variable The architecture label in the build generic labels method of the cluster API (CAPI) provider is now populated using the GetDefaultScaleFromZeroArchitecture().Name() method. The method allows CAPI users deploying the cluster-autoscaler to define the default architecture to be used by the cluster-autoscaler for scale up from zero via the env var CAPI_SCALE_ZERO_DEFAULT_ARCH. Amd64 is kept as a fallback for historical reasons. The introduced changes will not take into account the case of nodes heterogeneous in architecture. The labels generation to infer properties like the cpu architecture from the node groups' features should be considered as a CAPI provider specific implementation. * Update image builder to use Go 1.21.3 Some of Cluster Autoscaler code is now using features only available in Go 1.21. * Add node-delete-delay-after-taint to FAQ * Reports node taints. * Add debugging-snapshot-enabled back * Rename comments, logs, structs, and vars from packet to equinix metal * Rename types * fix: provider name to be used in builder to provide backward compatibility Signed-off-by: Ayush Rangwala <ayush.rangwala@gmail.com> * Rename comments, logs, structs, and vars from packet to equinix metal * Created a new env var for metal to replace/support packet env vars as usual * Support backward compatibility for PACKET_MANAGER env var Signed-off-by: Ayush Rangwala <ayush.rangwala@gmail.com> * fix: refactor cloud provider names Signed-off-by: Ayush Rangwala <ayush.rangwala@gmail.com> * Documents startup/status/ignore node taints. * Adding price info for c3d (Price for preemptible instances is calculated as: (Spot price / On-demand price) * instance prices) * Bump CA golang to 1.21.3 * cloudprovider/exoscale: update limits/quotas URL https://portal.exoscale.com/account/limits has been deprecated in favor of https://portal.exoscale.com/organization/quotas. Update README accordingly. * Add the AppVersion to cluster-autoscaler.labels as app.kubernetes.io/version * Bump version in chart.yaml * add note for CRD and RBAC handling for VPA (>=1.0.0) * feat(helm): add support for exoscale provider Signed-off-by: Thomas Stadler <thomas.stadler@whizus.com> * Add TOC link in README for EvictionRequirement example * Fix 'evictionRequirements.resources' to be plural in yaml * Run 'hack/generate-crd-yamls.sh' * Adapt AEP to have 'resources' in plural * Remove deprecated dependency: gogo/protobuf * Fix klog formating directives in cluster-autoscaler package. * Update kubernetes dependencies to 1.29.0-alpha.3. * Change scheduler framework function names after recent refactor in kubernetes scheduler. * chore(helm): bump version of cluster-autoscaler Signed-off-by: Thomas Stadler <thomas.stadler@whizus.com> * chore(helm): docs, update README template Signed-off-by: Thomas Stadler <thomas.stadler@whizus.com> * Fix capacityType label in AWS ManagedNodeGroup Fixes an issue where the capacityType label inferred from an empty EKS ManagedNodeGroup does not match the same label on the node after it is created and joins the cluster * Cleanup: Remove deprecated github.com/golang/protobuf usage - Regenerate cloudprovider/externalgrpc proto - go mod tidy * Remove maps.Copy usage. * chore: upgrade vpa go and k8s dependencies Signed-off-by: Amir Alavi <amiralavi7@gmail.com> * ScaleUp is only ever called when there are unscheduled pods * Bump golang from 1.21.2 to 1.21.4 in /vertical-pod-autoscaler/builder Bumps golang from 1.21.2 to 1.21.4. --- updated-dependencies: - dependency-name: golang dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * Disambiguate the resource usage node removal eligibility messages * Cleanup: Remove separate client for k8s events Remove RateLimiting options - replay on APF for apiserver protection. Details: kubernetes/kubernetes#111880 * Update Chart.yaml * Remove gce-expander-ephemeral-storage-support flag Always enable the feature * Add min/max/current asg size to log * Clarify that log line updates cache, now AWS * Update README.md: Link to Cluster-API Add Link to Cluster API. * azure: add owner-jackfrancis * Update OWNERS - typo * Update README.md * Template the autoDiscovery.clusterName variable in the Helm chart * fix: Add revisionHistoryLimit override to cluster-autoscaler Signed-off-by: Matt Dainty <matt@bodgit-n-scarper.com> * allow users to avoid aws instance not found spam * fix: alicloud the function NodeGroupForNode is incorrect * Update README.md Fix error in text * fix: handle error when listing machines Signed-off-by: Cyrill Troxler <cyrill@nine.ch> * AWS: cache instance requirements * fix: update node annotation used to limit log spam with valid key * Removes unnecessary check * Allow overriding domain suffix in GCE cloud provider. * chore(deps): update vendored hcloud-go to 2.4.0 Generated by: ``` UPSTREAM_REF=v2.4.0 hack/update-vendor.sh ``` * Add new pod list processors for clearing TPU requests & filtering out expendable pods Treat non-processed pods yet as unschedulable * Fix multiple comments and update flags * Add new test for new behaviour and revert changes made to other tests * Allow users to specify which schedulers to ignore * Update flags, Improve tests readability & use Bypass instead of ignore in naming * Update static_autoscaler tests & handle pod list processors errors as warnings * Fix: Include restartable init containers in Pod utilization calc Reuse k/k resourcehelper func * Implement ProvReq service * Set Go versions to the same settings kubernetes/kubernetes uses Looks like specifying the Go patch version in go.mod might've been a mistake: kubernetes/kubernetes#121808. * feat: implement kwok cloudprovider feat: wip implement `CloudProvider` interface boilerplate for `kwok` provider Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add builder for `kwok` - add logic to scale up and scale down nodes in `kwok` provider Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip parse node templates from file Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add short README Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: implement remaining things - to get the provider in a somewhat working state Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add in-cluster `kwok` as pre-requisite in the README Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: templates file not correctly marshalling into node list Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: `invalid leading UTF-8 octet` error during template parsing - remove encoding using `gob` - not required Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: use lister to get and list - instead of uncached kube client - add lister as a field on the provider and nodegroup struct Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: `did not find nodegroup annotation` error - CA was thinking the annotation is not present even though it is - fix a bug with parsing annotation Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: CA node recognizing fake nodegroups - add provider ID to nodes in the format `kwok:<node-name>` - fix invalid `KwokManagedAnnotation` - sanitize template nodes (remove `resourceVersion` etc.,) - not sanitizing the node leads to error during creation of new nodes - abstract code to get NG name into a separate function `getNGNameFromAnnotation` Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: node not getting deleted Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add empty test file Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: add OWNERS file Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip kwok provider config - add samples for static and dynamic template nodes Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip implement pulling node templates from cluster - add status field to kwok provider config - this is to capture how the nodes would be grouped by (can be annotation or label) - use kwok provider config status to get ng name from the node template Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: syntax error in calling `loadNodeTemplatesFromCluster` Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: first draft of dynamic node templates - this allows node templates to be pulled from the cluster - instead of having to specify static templates manually Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: syntax error Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: abstract out related code into separate files - use named constants instead of hardcoded values Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: cleanup kwok nodes when CA is exiting - so that the user doesn't have to cleanup the fake nodes themselves Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: return `nil` instead of err for `HasInstance` - because there is no underlying cloud provider (hence no reason to return `cloudprovider.ErrNotImplemented` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: start working on tests for kwok provider config Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add `gpuLabelKey` under `nodes` field in kwok provider config - fix validation for kwok provider config Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add motivation doc - update README with more details Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: update kwok provider config example to support pulling gpu labels and types from existing providers - still needs to be implemented in the code Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip update kwok provider config to get gpu label and available types Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip read gpu label and available types from specified provider - add available gpu types in kwok provider config status Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add validation for gpu fields in kwok provider config - load gpu related fields in kwok provider config status Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: implement `GetAvailableGPUTypes` Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add support to install and uninstall kwok - add option to disable installation - add option to manually specify kwok release tag - add future scope in readme Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add future scope 'evaluate adding support to check if kwok controller already exists' Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: vendor conflict and cyclic import - remove support to get gpu config from the specified provider (can't be used because leads to cyclic import) Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add a TODO 'get gpu config from other providers' Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `file` -> `configmap` - load config and templates from configmap instead of file - move `nodes` and `nodegroups` config to top level - add helper to encode configmap data into `[]bytes` - add helper to get current pod namespace Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add new options to the kwok provider config - auto install kwok only if the version is >= v0.4.0 - add test for `GPULabel()` - use `kubectl apply` way of installing kwok instead of kustomize - add test for kwok helpers - add test for kwok config - inject service account name in CA deployment - add example configmap for node templates and kwok provider config in CA helm chart - add permission to create `clusterrolebinding` (so that kwok provider can create a clusterrolebinding with `cluster-admin` role and create/delete upstream manifests) - update kwok provider sample configs - update `README` Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: update go.mod to use v1.28 packages Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: `go mod tidy` and `go mod vendor` (again) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: kwok installation code - add functions to create and delete clusterrolebinding to create kwok resources - refactor kwok install and uninstall fns - delete manifests in the opposite order of install ] - add cleaning up left-over kwok installation to future scope Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: nil ptr error - add `TODO` in README for adding docs around kwok config fields Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove code to automatically install and uninstall `kwok` - installing/uninstalling requires strong permissions to be granted to `kwok` - granting strong permissions to `kwok` means granting strong permissions to the entire CA codebase - this can pose a security risk - I have removed the code related to install and uninstall for now - will proceed after discussion with the community Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: run `go mod tidy` and `go mod vendor` Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: add permission to create nodes - to fix permissions error for kwok provider Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add more unit tests - add tests for kwok helpers - fix and update kwok config tests - fix a bug where gpu label was getting assigned to `kwokConfig.status.key` - expose `loadConfigFile` -> `LoadConfigFile` - throw error if templates configmap does not have `templates` key (value of which is node templates) - finish test for `GPULabel()` - add tests for `NodeGroupForNode()` - expose `loadNodeTemplatesFromConfigMap` -> `LoadNodeTemplatesFromConfigMap` - fix `KwokCloudProvider`'s kwok config was empty (this caused `GPULabel()` to return empty) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: abstract provider ID code into `getProviderID` fn - fix provider name in test `kwok` -> `kwok:kind-worker-xxx` Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: run `go mod vendor` and `go mod tidy Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs(cloudprovider/kwok): update info on creating nodegroups based on `hostname/label` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor(charts): replace fromLabelKey value `"kubernetes.io/hostname"` -> `"kwok-nodegroup"` - `"kubernetes.io/hostname"` leads to infinite scale-up Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: support running CA with kwok provider locally Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use global informer factory Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use `fromNodeLabelKey: "kwok-nodegroup"` in test templates Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: `Cleanup()` logic - clean up only nodes managed by the kwok provider Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix/refactor: nodegroup creation logic - fix issue where fake node was getting created which caused fatal error - use ng annotation to keep track of nodegroups - (when creating nodegroups) don't process nodes which don't have the right ng nabel - suffix ng name with unix timestamp Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor/test(cloudprovider/kwok): write tests for `BuildKwokProvider` and `Cleanup` - pass only the required node lister to cloud provider instead of the entire informer factory - pass the required configmap name to `LoadNodeTemplatesFromConfigMap` instead of passing the entire kwok provider config - implement fake node lister for testing Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add test case for dynamic templates in `TestNodeGroupForNode` - remove non-required fields from template node Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add tests for `NodeGroups()` - add extra node template without ng selector label to add more variability in the test Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: write tests for `GetNodeGpuConfig()` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add test for `GetAvailableGPUTypes` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add test for `GetResourceLimiter()` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add tests for nodegroup's `IncreaseSize()` - abstract error msgs into variables to use them in tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add test for ng `DeleteNodes()` fn - add check for deleting too many nodes - rename err msg var names to make them consistent Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add tests for ng `DecreaseTargetSize()` - abstract error msgs into variables (for easy use in tests) Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add test for ng `Nodes()` - add extra test case for `DecreaseTargetSize()` to check lister error Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add test for ng `TemplateNodeInfo` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): improve tests for `BuildKwokProvider()` - add more test cases - refactor lister for `TestBuildKwokProvider()` and `TestCleanUp()` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add test for ng `GetOptions` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): unset `KWOK_CONFIG_MAP_NAME` at the end of the test - not doing so leads to failure in other tests - remove `kwokRelease` field from kwok config (not used anymore) - this was causing the tests to fail Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: bump CA chart version - this is because of changes made related to kwok - fix type `everwhere` -> `everywhere` Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: fix linting checks Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: address CI lint errors Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: generate helm docs for `kwokConfigMapName` - remove `KWOK_CONFIG_MAP_KEY` (not being used in the code) - bump helm chart version Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: revise the outline for README - add AEP link to the motivation doc Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: wip create an outline for the README - remove `kwok` field from examples (not needed right now) Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add outline for ascii gifs Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename env variable `KWOK_CONFIG_MAP_NAME` -> `KWOK_PROVIDER_CONFIGMAP` Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: update README with info around installation and benefits of using kwok provider - add `Kwok` as a provider in main CA README Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: run `go mod vendor` - remove TODOs that are not needed anymore Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: finish first draft of README Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: env variable in chart `KWOK_CONFIG_MAP_NAME` -> `KWOK_PROVIDER_CONFIGMAP` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove redundant/deprecated code Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: bump chart version `9.30.1` -> `9.30.2` - because of kwok provider related changes Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: fix typo `offical` -> `official` Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: remove debug log msg Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add links for getting help Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: fix type in log `external cluster` -> `cluster` Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: add newline in chart.yaml to fix CI lint Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: fix mistake `sig-kwok` -> `sig-scheduling` - kwok is a part if sig-scheduling (there is no sig-kwok) Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: fix type `release"` -> `"release"` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: pass informer instead of lister to cloud provider builder fn Signed-off-by: vadasambar <surajrbanakar@gmail.com> * add unit test for function getScalingInstancesByGroup * Azure: Remove AKS vmType Signed-off-by: Jack Francis <jackfrancis@gmail.com> * Implement TemplateNodeInfo for civo cloudprovider Signed-off-by: Vishal Anarse <vishalanarse11@gmail.com> * Add comment for type and function Signed-off-by: Vishal Anarse <vishalanarse11@gmail.com> * refactor(*): move getKubeClient to utils/kubernetes (cherry picked from commit b9f636d) Signed-off-by: qianlei.qianl <qianlei.qianl@bytedance.com> refactor: move logic to create client to utils/kubernetes pkg - expose `CreateKubeClient` as public function - make `GetKubeConfig` into a private `getKubeConfig` function (can be exposed as a public function in the future if needed) Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: CI failing because cloudproviders were not updated to use new autoscaling option fields Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: define errors as constants Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: pass kube client options by value Signed-off-by: vadasambar <surajrbanakar@gmail.com> * Calculate real value for template using node group Signed-off-by: Vishal Anarse <vishalanarse11@gmail.com> * Fix lint error * Fix tests Signed-off-by: Vishal Anarse <vishalanarse11@gmail.com> * Update aws-sdk-go to 1.48.7 via tarball Remove *_test.go, models/, examples * + Added SDK version in the log + Update version in README + command * Switch to multistage build Dockerfiles for VPA * Adding 33 instances types * heml chart - update cluster-autoscaler to 1.28 * Bump builder images to go 1.21.5 * feat: add metrics to show target size of every node group * deprecate unused node-autoprovisioning-enabled and max-autoprovisioned-node-group-count flags Signed-off-by: Prashant Rewar <108176843+prashantrewar@users.noreply.github.com> * fix(hetzner): insufficient nodes when boot fails The Hetzner Cloud API returns "Actions" for anything asynchronous that happens inside the backend. When creating a new server multiple actions are returned: `create_server`, `start_server`, `attach_to_network` (if set). Our current code waits for the `create_server` and if it fails, it makes sure to delete the server so cluster-autoscaler can create a new one immediately to provide the required capacity. If one of the "follow up" actions fails though, we do not handle this. This causes issues when the server for whatever reason did not start properly on the first try, as then the customer has a shutdown server, is paying for it, but does not receive the additional capacity for their Kubernetes cluster. This commit fixes the bug, by awaiting all actions returned by the create server API call, and deleting the server if any of them fail. * Add VSCode workspace files to .gitignore * Remove vpa/builder and switch dependabot updates to component Dockerfiles * fix: updated readme for hetzner cloud provider * Add error details to autoscaling backoff. Change-Id: I3b5c62ba13c2e048ce2d7170016af07182c11eee * Make backoff.Status.ErrorInfo non-pointer. Change-Id: I1f812d4d6f42db97670ef7304fc0e895c837a13b * allow specifing grpc timeout rather than hardcoded 5 seconds Signed-off-by: lizhen <lizhen@outlook.jp> * [GCE] Support paginated instance listing * azure: fix chart bugs after AKS vmType deprecation Signed-off-by: Jack Francis <jackfrancis@gmail.com> * Update VPA release README to reference 1.X VPA versions. * implement priority based evictor and refactor drain logic * Update dependencies to kubernetes 1.29.0 * [civo] Add Gpu count to node template Signed-off-by: Vishal Anarase <vishalanarse11@gmail.com> (cherry picked from commit 8703ff9) * Restore flags for setting QPS limit in CA Partially undo kubernetes#6274. I noticed that with this change CA get rate limited and slows down significantly (especially during large scale downs). * Pass Burst and QPS client params to capi k8s clients * Dependency update for CA 1.29.1 * feat: support `--scale-down-delay-after-*` per nodegroup Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: update scale down status after every scale up - move scaledown delay status to cluster state/registry - enable scale down if `ScaleDownDelayTypeLocal` is enabled - add new funcs on cluster state to get and update scale down delay status - use timestamp instead of booleans to track scale down delay status Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use existing fields on clusterstate - uses `scaleUpRequests`, `scaleDownRequests` and `scaleUpFailures` instead of `ScaleUpDelayStatus` - changed the above existing fields a little to make them more convenient for use - moved initializing scale down delay processor to static autoscaler (because clusterstate is not available in main.go) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove note saying only `scale-down-after-add` is supported - because we are supporting all the flags Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: evaluate `scaleDownInCooldown` the old way only if `ScaleDownDelayTypeLocal` is set to `false` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove line saying `--scale-down-delay-type-local` is only supported for `--scale-down-delay-after-add` - because it is not true anymore - we are supporting all `--scale-down-delay-after-*` flags per nodegroup Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix clusterstate tests failing Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: move back initializing processors logic to from static autoscaler to main - we don't want to initialize processors in static autoscaler because anyone implementing an alternative to static_autoscaler has to initialize the processors - and initializing specific processors is making static autoscaler aware of an implementation detail which might not be the best practice Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: revert changes related to `clusterstate` - since I am going with observer pattern Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add observer interface for state of scaling - to implement observer pattern for tracking state of scale up/downs (as opposed to using clusterstate to do the same) - refactor `ScaleDownCandidatesDelayProcessor` to use fields from the new observer Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove params passed to `clearScaleUpFailures` - not needed anymore Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: revert clusterstate tests - approach has changed - I am not making any changes in clusterstate now Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: add accidentally deleted lines for clusterstate test Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: implement `Add` fn for scale state observer - to easily add new observers - re-word comments - remove redundant params from `NewDefaultScaleDownCandidatesProcessor` Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: CI complaining because no comments on fn definitions Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: initialize parent `ScaleDownCandidatesProcessor` - instead of `ScaleDownCandidatesSortingProcessor` and `ScaleDownCandidatesDelayProcessor` separately Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: add scale state notifier to list of default processors - initialize processors for `NewDefaultScaleDownCandidatesProcessor` outside and pass them to the fn - this allows more flexibility Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: add observer interface - create a separate observer directory - implement `RegisterScaleUp` function in the clusterstate - TODO: resolve syntax errors Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: use `scaleStateNotifier` in place of `clusterstate` - delete leftover `scale_stateA_observer.go` (new one is already present in `observers` directory) - register `clustertstate` with `scaleStateNotifier` - use `Register` instead of `Add` function in `scaleStateNotifier` - fix `go build` - wip: fixing tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix syntax errors - add utils package `pointers` for converting `time` to pointer (without having to initialize a new variable) Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip track scale down failures along with scale up failures - I was tracking scale up failures but not scale down failures - fix copyright year 2017 -> 2023 for the new `pointers` package Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: register failed scale down with scale state notifier - wip writing tests for `scale_down_candidates_delay_processor` - fix CI lint errors - remove test file for `scale_down_candidates_processor` (there is not much to test as of now) Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: wip tests for `ScaleDownCandidatesDelayProcessor` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add unit tests for `ScaleDownCandidatesDelayProcessor` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: don't track scale up failures in `ScaleDownCandidatesDelayProcessor` - not needed Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: better doc comments for `TestGetScaleDownCandidates` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: don't ignore error in `NGChangeObserver` - return it instead and let the caller decide what to do with it Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: change pointers to values in `NGChangeObserver` interface - easier to work with - remove `expectedAddTime` param from `RegisterScaleUp` (not needed for now) - add tests for clusterstate's `RegisterScaleUp` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: conditions in `GetScaleDownCandidates` - set scale down in cool down if the number of scale down candidates is 0 Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: use `ng1` instead of `ng2` in existing test Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip static autoscaler tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: assign directly instead of using `sdProcessor` variable - variable is not needed Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: first working test for static autoscaler Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: continue working on static autoscaler tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: wip second static autoscaler test Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove `Println` used for debugging Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add static_autoscaler tests for scale down delay per nodegroup flags Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: rebase off the latest `master` - change scale state observer interface's `RegisterFailedScaleup` to reflect latest changes around clusterstate's `RegisterFailedScaleup` in `master` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix clusterstate test failing Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix failing orchestrator test Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `defaultScaleDownCandidatesProcessor` -> `combinedScaleDownCandidatesProcessor` - describes the processor better Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: replace `NGChangeObserver` -> `NodeGroupChangeObserver` - makes it easier to understand for someone not familiar with the codebase Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: reword code comment `after` -> `for which` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: don't return error from `RegisterScaleDown` - not needed as of now (no implementer function returns a non-nil error for this function) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: address review comments around ng change observer interface - change dir structure of nodegroup change observer package - stop returning errors wherever it is not needed in the nodegroup change observer interface - rename `NGChangeObserver` -> `NodeGroupChangeObserver` interface (makes it easier to understand) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: make nodegroupchange observer thread-safe Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add TODO to consider using multiple mutexes in nodegroupchange observer Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use `time.Now()` directly instead of assigning a variable to it Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: share code for checking if there was a recent scale-up/down/failure Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: convert `ScaleDownCandidatesDelayProcessor` into table tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: change scale state notifier's `Register()` -> `RegisterForNotifications()` - makes it easier to understand what the function does Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: replace scale state notifier `Register` -> `RegisterForNotifications` in test - to fix syntax errors since it is already renamed in the actual code Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove `clusterStateRegistry` from `delete_in_batch` tests - not needed anymore since we have `scaleStateNotifier` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: address PR review comments Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: add empty `RegisterFailedScaleDown` for clusterstate - fix syntax error in static autoscaler test Signed-off-by: vadasambar <surajrbanakar@gmail.com> (cherry picked from commit 5de49a1) * Backport kubernetes#6522 [CA] Bump go version into CA1.29 * Backport kubernetes#6491 and kubernetes#6494 [CA] Add informer argument to the CloudProviders builder into CA1.29 * Merge pull request kubernetes#6617 from ionos-cloud/update-ionos-sdk ionoscloud: Update ionos-cloud sdk-go and add metrics * CA - Update k/k vendor to 1.29.3 * [v1.29][Hetzner] Fix missing ephemeral storage definition This fixed requests for pods with ephemeral storage requests being denied due to insufficient ephemeral storage for the Hetzner provider. Backport of kubernetes#6574 to `v1.29` branch. * Use cache to track vms pools * fx * Add UTs * Fx boilder plate header * Add const * Rename vmsPoolSet * [v1.29][Hetzner] Fix Autoscaling for worker nodes with invalid ProviderID This change fixes a bug that arises when the user's cluster includes worker nodes not from Hetzner Cloud, such as a Hetzner Dedicated server or any server resource other than Hetzner. It also corrects the behavior when a server has been physically deleted from Hetzner Cloud. Signed-off-by: Maksim Paskal <paskal.maksim@gmail.com> * Sync with upstream v1.29.0 * Sync with upstream v1.29.0 * Gofmt format * Update go.mod for vpA * Update go.mod for vpa/e2e * Added mcm as exception in boilerplate * Added integration as exception in boilerplate * Updateed README for charts * Addressed review comments * Addressed review comments * Addressed review comments * Addressed review comments * Improved log levels for better logging * modified the log place * Added flags --------- Signed-off-by: Jonathan Raymond <jonwraymond@gmail.com> Signed-off-by: Ayush Rangwala <ayush.rangwala@gmail.com> Signed-off-by: Thomas Stadler <thomas.stadler@whizus.com> Signed-off-by: Amir Alavi <amiralavi7@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Matt Dainty <matt@bodgit-n-scarper.com> Signed-off-by: Cyrill Troxler <cyrill@nine.ch> Signed-off-by: vadasambar <surajrbanakar@gmail.com> Signed-off-by: Jack Francis <jackfrancis@gmail.com> Signed-off-by: Vishal Anarse <vishalanarse11@gmail.com> Signed-off-by: Prashant Rewar <108176843+prashantrewar@users.noreply.github.com> Signed-off-by: lizhen <lizhen@outlook.jp> Signed-off-by: Maksim Paskal <paskal.maksim@gmail.com> Co-authored-by: Daniel Lipovetsky <dlipovetsky@d2iq.com> Co-authored-by: Kubernetes Prow Robot <k8s-ci-robot@users.noreply.github.com> Co-authored-by: Mathieu Bruneau <brunemat@amazon.com> Co-authored-by: Artem Minyaylov <artemvmin@google.com> Co-authored-by: Dumlu Timuralp <34840364+dumlutimuralp@users.noreply.github.com> Co-authored-by: Hakan Bostan <hbostan@google.com> Co-authored-by: Rich Gowman <rgowman@google.com> Co-authored-by: Daniel Gutowski <danielgutowski@google.com> Co-authored-by: mikutas <23391543+mikutas@users.noreply.github.com> Co-authored-by: Jonathan Raymond <jonwraymond@gmail.com> Co-authored-by: Johnnie Ho <johnnieho89@gmail.com> Co-authored-by: aleskandro <aleskandro@redhat.com> Co-authored-by: Kuba Tużnik <jtuznik@google.com> Co-authored-by: lisenet <tomas@lisenet.com> Co-authored-by: Piotr Wrótniak <piotrwrotniak@google.com> Co-authored-by: Ayush Rangwala <ayush.rangwala@gmail.com> Co-authored-by: Dixita Narang <ndixita@google.com> Co-authored-by: Artur Żyliński <azylinski@google.com> Co-authored-by: Alexandros Afentoulis <alexandros.afentoulis@exoscale.ch> Co-authored-by: jw-maynard <jmaynard@playq.net> Co-authored-by: xiaoqing <xiaoqingnb@gmail.com> Co-authored-by: Thomas Stadler <thomas.stadler@whizus.com> Co-authored-by: Marco Voelz <marco.voelz@sap.com> Co-authored-by: Aleksandra Gacek <algacek@google.com> Co-authored-by: Luis Ramirez <luis.ramirez.rivera92@gmail.com> Co-authored-by: piotrwrotniak <91665466+piotrwrotniak@users.noreply.github.com> Co-authored-by: Amir Alavi <amiralavi7@gmail.com> Co-authored-by: Michael Grosser <michael@grosser.it> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: shapirus <me@shapirus.net> Co-authored-by: Guy Templeton <guyjtempleton@googlemail.com> Co-authored-by: Mads Hartmann <mads379@gmail.com> Co-authored-by: Thomas Güttler <info@thomas-guettler.de> Co-authored-by: Prachi Gandhi <prachigandhi@microsoft.com> Co-authored-by: Prachi Gandhi <73401862+gandhipr@users.noreply.github.com> Co-authored-by: Mike Tougeron <tougeron@adobe.com> Co-authored-by: Matt Dainty <matt@bodgit-n-scarper.com> Co-authored-by: Guo Peng <370090914@qq.com> Co-authored-by: Alex Serbul <22218473+AlexanderSerbul@users.noreply.github.com> Co-authored-by: Cyrill Troxler <cyrill@nine.ch> Co-authored-by: alexanderConstantinescu <alexander.constantinescu88@gmail.com> Co-authored-by: Brydon Cheyney <bcheyney@nerdwallet.com> Co-authored-by: Julian Tölle <julian.toelle@hetzner-cloud.de> Co-authored-by: Mahmoud Atwa <mahmoudatwa@google.com> Co-authored-by: Yaroslava Serdiuk <yaroslava@google.com> Co-authored-by: vadasambar <surajrbanakar@gmail.com> Co-authored-by: Jack Francis <jackfrancis@gmail.com> Co-authored-by: Vishal Anarse <vishalanarse11@gmail.com> Co-authored-by: qianlei.qianl <qianlei.qianl@bytedance.com> Co-authored-by: Andrea Scarpino <andrea@scarpino.dev> Co-authored-by: Prashant Rewar <108176843+prashantrewar@users.noreply.github.com> Co-authored-by: Jont828 <jt572@cornell.edu> Co-authored-by: Pascal <pascal@pascal.sh> Co-authored-by: Walid Ghallab <walidghallab@google.com> Co-authored-by: lizhen <lizhen@outlook.jp> Co-authored-by: Daniel Kłobuszewski <danielmk@google.com> Co-authored-by: Luiz Antonio <luizaoj@google.com> Co-authored-by: damikag <damika@google.com> Co-authored-by: Maciek Pytel <maciekpytel@google.com> Co-authored-by: Joachim Bartosik <jbartosik@google.com> Co-authored-by: Kyle Weaver <kweaver@mux.com> Co-authored-by: shubham82 <shubham.kuchhal@india.nec.com> Co-authored-by: Kubernetes Prow Robot <20407524+k8s-ci-robot@users.noreply.github.com> Co-authored-by: wenxuanW <wenxuan0923@outlook.com> Co-authored-by: Maksim Paskal <paskal.maksim@gmail.com>

* Update with make generate * Add pdb filtering to remainingPdbTracker * Convert replicated, system, not-safe-to-evict, and local storage pods to drainability rules * Convert scale-down pdb check to drainability rule * Pass DeleteOptions once during default rule creation * Split out custom controller and common checks into separate drainability rules * Filter out disabled drainability rules during creation * Refactor GetPodsForDeletion logic and tests into simulator * Fix custom controller drainability rule and add test coverage * Add unit test for long-terminating pod past grace period * Removed node drainer, kept node termination handler * Add HasNodeGroupStartedScaleUp to cluster state registry. - HasNodeGroupStartedScaleUp checks wheter a scale up request exists without checking any upcoming nodes. * Add kwiesmueller to OWNERS jbartosik et al are transitioning off of workload autoscalers (incl vpa and addon-resizer). kwiesmueller is on the new team and has agreed to take on reviewer/approver responsibilities. * Add information about provisioning-class-name annotation. * Remove redundant if branch * Add mechanism to override drainability status * Log drainability override * fix(cluster-autoscaler-chart): if secretKeyRefNameOverride is true, don't create secret Signed-off-by: Jonathan Raymond <jonwraymond@gmail.com> * fix: correct version bump Signed-off-by: Jonathan Raymond <jonwraymond@gmail.com> * Initialize default drainability rules * feat: each node pool can now have different init configs * ClusterAPI: Allow overriding the kubernetes.io/arch label set by the scale from zero method via environment variable The architecture label in the build generic labels method of the cluster API (CAPI) provider is now populated using the GetDefaultScaleFromZeroArchitecture().Name() method. The method allows CAPI users deploying the cluster-autoscaler to define the default architecture to be used by the cluster-autoscaler for scale up from zero via the env var CAPI_SCALE_ZERO_DEFAULT_ARCH. Amd64 is kept as a fallback for historical reasons. The introduced changes will not take into account the case of nodes heterogeneous in architecture. The labels generation to infer properties like the cpu architecture from the node groups' features should be considered as a CAPI provider specific implementation. * Update image builder to use Go 1.21.3 Some of Cluster Autoscaler code is now using features only available in Go 1.21. * Add node-delete-delay-after-taint to FAQ * Reports node taints. * Add debugging-snapshot-enabled back * Rename comments, logs, structs, and vars from packet to equinix metal * Rename types * fix: provider name to be used in builder to provide backward compatibility Signed-off-by: Ayush Rangwala <ayush.rangwala@gmail.com> * Rename comments, logs, structs, and vars from packet to equinix metal * Created a new env var for metal to replace/support packet env vars as usual * Support backward compatibility for PACKET_MANAGER env var Signed-off-by: Ayush Rangwala <ayush.rangwala@gmail.com> * fix: refactor cloud provider names Signed-off-by: Ayush Rangwala <ayush.rangwala@gmail.com> * Documents startup/status/ignore node taints. * Adding price info for c3d (Price for preemptible instances is calculated as: (Spot price / On-demand price) * instance prices) * Bump CA golang to 1.21.3 * cloudprovider/exoscale: update limits/quotas URL https://portal.exoscale.com/account/limits has been deprecated in favor of https://portal.exoscale.com/organization/quotas. Update README accordingly. * Add the AppVersion to cluster-autoscaler.labels as app.kubernetes.io/version * Bump version in chart.yaml * add note for CRD and RBAC handling for VPA (>=1.0.0) * feat(helm): add support for exoscale provider Signed-off-by: Thomas Stadler <thomas.stadler@whizus.com> * Add TOC link in README for EvictionRequirement example * Fix 'evictionRequirements.resources' to be plural in yaml * Run 'hack/generate-crd-yamls.sh' * Adapt AEP to have 'resources' in plural * Remove deprecated dependency: gogo/protobuf * Fix klog formating directives in cluster-autoscaler package. * Update kubernetes dependencies to 1.29.0-alpha.3. * Change scheduler framework function names after recent refactor in kubernetes scheduler. * chore(helm): bump version of cluster-autoscaler Signed-off-by: Thomas Stadler <thomas.stadler@whizus.com> * chore(helm): docs, update README template Signed-off-by: Thomas Stadler <thomas.stadler@whizus.com> * Fix capacityType label in AWS ManagedNodeGroup Fixes an issue where the capacityType label inferred from an empty EKS ManagedNodeGroup does not match the same label on the node after it is created and joins the cluster * Cleanup: Remove deprecated github.com/golang/protobuf usage - Regenerate cloudprovider/externalgrpc proto - go mod tidy * Remove maps.Copy usage. * chore: upgrade vpa go and k8s dependencies Signed-off-by: Amir Alavi <amiralavi7@gmail.com> * ScaleUp is only ever called when there are unscheduled pods * Bump golang from 1.21.2 to 1.21.4 in /vertical-pod-autoscaler/builder Bumps golang from 1.21.2 to 1.21.4. --- updated-dependencies: - dependency-name: golang dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * Disambiguate the resource usage node removal eligibility messages * Cleanup: Remove separate client for k8s events Remove RateLimiting options - replay on APF for apiserver protection. Details: kubernetes/kubernetes#111880 * Update Chart.yaml * Remove gce-expander-ephemeral-storage-support flag Always enable the feature * Add min/max/current asg size to log * Clarify that log line updates cache, now AWS * Update README.md: Link to Cluster-API Add Link to Cluster API. * azure: add owner-jackfrancis * Update OWNERS - typo * Update README.md * Template the autoDiscovery.clusterName variable in the Helm chart * fix: Add revisionHistoryLimit override to cluster-autoscaler Signed-off-by: Matt Dainty <matt@bodgit-n-scarper.com> * allow users to avoid aws instance not found spam * fix: alicloud the function NodeGroupForNode is incorrect * Update README.md Fix error in text * fix: handle error when listing machines Signed-off-by: Cyrill Troxler <cyrill@nine.ch> * AWS: cache instance requirements * fix: update node annotation used to limit log spam with valid key * Removes unnecessary check * Allow overriding domain suffix in GCE cloud provider. * chore(deps): update vendored hcloud-go to 2.4.0 Generated by: ``` UPSTREAM_REF=v2.4.0 hack/update-vendor.sh ``` * Add new pod list processors for clearing TPU requests & filtering out expendable pods Treat non-processed pods yet as unschedulable * Fix multiple comments and update flags * Add new test for new behaviour and revert changes made to other tests * Allow users to specify which schedulers to ignore * Update flags, Improve tests readability & use Bypass instead of ignore in naming * Update static_autoscaler tests & handle pod list processors errors as warnings * Fix: Include restartable init containers in Pod utilization calc Reuse k/k resourcehelper func * Implement ProvReq service * Set Go versions to the same settings kubernetes/kubernetes uses Looks like specifying the Go patch version in go.mod might've been a mistake: kubernetes/kubernetes#121808. * feat: implement kwok cloudprovider feat: wip implement `CloudProvider` interface boilerplate for `kwok` provider Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add builder for `kwok` - add logic to scale up and scale down nodes in `kwok` provider Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip parse node templates from file Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add short README Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: implement remaining things - to get the provider in a somewhat working state Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add in-cluster `kwok` as pre-requisite in the README Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: templates file not correctly marshalling into node list Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: `invalid leading UTF-8 octet` error during template parsing - remove encoding using `gob` - not required Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: use lister to get and list - instead of uncached kube client - add lister as a field on the provider and nodegroup struct Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: `did not find nodegroup annotation` error - CA was thinking the annotation is not present even though it is - fix a bug with parsing annotation Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: CA node recognizing fake nodegroups - add provider ID to nodes in the format `kwok:<node-name>` - fix invalid `KwokManagedAnnotation` - sanitize template nodes (remove `resourceVersion` etc.,) - not sanitizing the node leads to error during creation of new nodes - abstract code to get NG name into a separate function `getNGNameFromAnnotation` Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: node not getting deleted Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add empty test file Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: add OWNERS file Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip kwok provider config - add samples for static and dynamic template nodes Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip implement pulling node templates from cluster - add status field to kwok provider config - this is to capture how the nodes would be grouped by (can be annotation or label) - use kwok provider config status to get ng name from the node template Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: syntax error in calling `loadNodeTemplatesFromCluster` Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: first draft of dynamic node templates - this allows node templates to be pulled from the cluster - instead of having to specify static templates manually Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: syntax error Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: abstract out related code into separate files - use named constants instead of hardcoded values Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: cleanup kwok nodes when CA is exiting - so that the user doesn't have to cleanup the fake nodes themselves Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: return `nil` instead of err for `HasInstance` - because there is no underlying cloud provider (hence no reason to return `cloudprovider.ErrNotImplemented` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: start working on tests for kwok provider config Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add `gpuLabelKey` under `nodes` field in kwok provider config - fix validation for kwok provider config Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add motivation doc - update README with more details Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: update kwok provider config example to support pulling gpu labels and types from existing providers - still needs to be implemented in the code Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip update kwok provider config to get gpu label and available types Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip read gpu label and available types from specified provider - add available gpu types in kwok provider config status Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add validation for gpu fields in kwok provider config - load gpu related fields in kwok provider config status Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: implement `GetAvailableGPUTypes` Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add support to install and uninstall kwok - add option to disable installation - add option to manually specify kwok release tag - add future scope in readme Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add future scope 'evaluate adding support to check if kwok controller already exists' Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: vendor conflict and cyclic import - remove support to get gpu config from the specified provider (can't be used because leads to cyclic import) Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add a TODO 'get gpu config from other providers' Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `file` -> `configmap` - load config and templates from configmap instead of file - move `nodes` and `nodegroups` config to top level - add helper to encode configmap data into `[]bytes` - add helper to get current pod namespace Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add new options to the kwok provider config - auto install kwok only if the version is >= v0.4.0 - add test for `GPULabel()` - use `kubectl apply` way of installing kwok instead of kustomize - add test for kwok helpers - add test for kwok config - inject service account name in CA deployment - add example configmap for node templates and kwok provider config in CA helm chart - add permission to create `clusterrolebinding` (so that kwok provider can create a clusterrolebinding with `cluster-admin` role and create/delete upstream manifests) - update kwok provider sample configs - update `README` Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: update go.mod to use v1.28 packages Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: `go mod tidy` and `go mod vendor` (again) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: kwok installation code - add functions to create and delete clusterrolebinding to create kwok resources - refactor kwok install and uninstall fns - delete manifests in the opposite order of install ] - add cleaning up left-over kwok installation to future scope Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: nil ptr error - add `TODO` in README for adding docs around kwok config fields Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove code to automatically install and uninstall `kwok` - installing/uninstalling requires strong permissions to be granted to `kwok` - granting strong permissions to `kwok` means granting strong permissions to the entire CA codebase - this can pose a security risk - I have removed the code related to install and uninstall for now - will proceed after discussion with the community Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: run `go mod tidy` and `go mod vendor` Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: add permission to create nodes - to fix permissions error for kwok provider Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add more unit tests - add tests for kwok helpers - fix and update kwok config tests - fix a bug where gpu label was getting assigned to `kwokConfig.status.key` - expose `loadConfigFile` -> `LoadConfigFile` - throw error if templates configmap does not have `templates` key (value of which is node templates) - finish test for `GPULabel()` - add tests for `NodeGroupForNode()` - expose `loadNodeTemplatesFromConfigMap` -> `LoadNodeTemplatesFromConfigMap` - fix `KwokCloudProvider`'s kwok config was empty (this caused `GPULabel()` to return empty) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: abstract provider ID code into `getProviderID` fn - fix provider name in test `kwok` -> `kwok:kind-worker-xxx` Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: run `go mod vendor` and `go mod tidy Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs(cloudprovider/kwok): update info on creating nodegroups based on `hostname/label` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor(charts): replace fromLabelKey value `"kubernetes.io/hostname"` -> `"kwok-nodegroup"` - `"kubernetes.io/hostname"` leads to infinite scale-up Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: support running CA with kwok provider locally Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use global informer factory Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use `fromNodeLabelKey: "kwok-nodegroup"` in test templates Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: `Cleanup()` logic - clean up only nodes managed by the kwok provider Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix/refactor: nodegroup creation logic - fix issue where fake node was getting created which caused fatal error - use ng annotation to keep track of nodegroups - (when creating nodegroups) don't process nodes which don't have the right ng nabel - suffix ng name with unix timestamp Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor/test(cloudprovider/kwok): write tests for `BuildKwokProvider` and `Cleanup` - pass only the required node lister to cloud provider instead of the entire informer factory - pass the required configmap name to `LoadNodeTemplatesFromConfigMap` instead of passing the entire kwok provider config - implement fake node lister for testing Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add test case for dynamic templates in `TestNodeGroupForNode` - remove non-required fields from template node Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add tests for `NodeGroups()` - add extra node template without ng selector label to add more variability in the test Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: write tests for `GetNodeGpuConfig()` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add test for `GetAvailableGPUTypes` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add test for `GetResourceLimiter()` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add tests for nodegroup's `IncreaseSize()` - abstract error msgs into variables to use them in tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add test for ng `DeleteNodes()` fn - add check for deleting too many nodes - rename err msg var names to make them consistent Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add tests for ng `DecreaseTargetSize()` - abstract error msgs into variables (for easy use in tests) Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add test for ng `Nodes()` - add extra test case for `DecreaseTargetSize()` to check lister error Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add test for ng `TemplateNodeInfo` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): improve tests for `BuildKwokProvider()` - add more test cases - refactor lister for `TestBuildKwokProvider()` and `TestCleanUp()` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add test for ng `GetOptions` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): unset `KWOK_CONFIG_MAP_NAME` at the end of the test - not doing so leads to failure in other tests - remove `kwokRelease` field from kwok config (not used anymore) - this was causing the tests to fail Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: bump CA chart version - this is because of changes made related to kwok - fix type `everwhere` -> `everywhere` Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: fix linting checks Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: address CI lint errors Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: generate helm docs for `kwokConfigMapName` - remove `KWOK_CONFIG_MAP_KEY` (not being used in the code) - bump helm chart version Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: revise the outline for README - add AEP link to the motivation doc Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: wip create an outline for the README - remove `kwok` field from examples (not needed right now) Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add outline for ascii gifs Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename env variable `KWOK_CONFIG_MAP_NAME` -> `KWOK_PROVIDER_CONFIGMAP` Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: update README with info around installation and benefits of using kwok provider - add `Kwok` as a provider in main CA README Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: run `go mod vendor` - remove TODOs that are not needed anymore Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: finish first draft of README Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: env variable in chart `KWOK_CONFIG_MAP_NAME` -> `KWOK_PROVIDER_CONFIGMAP` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove redundant/deprecated code Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: bump chart version `9.30.1` -> `9.30.2` - because of kwok provider related changes Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: fix typo `offical` -> `official` Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: remove debug log msg Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add links for getting help Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: fix type in log `external cluster` -> `cluster` Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: add newline in chart.yaml to fix CI lint Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: fix mistake `sig-kwok` -> `sig-scheduling` - kwok is a part if sig-scheduling (there is no sig-kwok) Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: fix type `release"` -> `"release"` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: pass informer instead of lister to cloud provider builder fn Signed-off-by: vadasambar <surajrbanakar@gmail.com> * add unit test for function getScalingInstancesByGroup * Azure: Remove AKS vmType Signed-off-by: Jack Francis <jackfrancis@gmail.com> * Implement TemplateNodeInfo for civo cloudprovider Signed-off-by: Vishal Anarse <vishalanarse11@gmail.com> * Add comment for type and function Signed-off-by: Vishal Anarse <vishalanarse11@gmail.com> * refactor(*): move getKubeClient to utils/kubernetes (cherry picked from commit b9f636d) Signed-off-by: qianlei.qianl <qianlei.qianl@bytedance.com> refactor: move logic to create client to utils/kubernetes pkg - expose `CreateKubeClient` as public function - make `GetKubeConfig` into a private `getKubeConfig` function (can be exposed as a public function in the future if needed) Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: CI failing because cloudproviders were not updated to use new autoscaling option fields Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: define errors as constants Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: pass kube client options by value Signed-off-by: vadasambar <surajrbanakar@gmail.com> * Calculate real value for template using node group Signed-off-by: Vishal Anarse <vishalanarse11@gmail.com> * Fix lint error * Fix tests Signed-off-by: Vishal Anarse <vishalanarse11@gmail.com> * Update aws-sdk-go to 1.48.7 via tarball Remove *_test.go, models/, examples * + Added SDK version in the log + Update version in README + command * Switch to multistage build Dockerfiles for VPA * Adding 33 instances types * heml chart - update cluster-autoscaler to 1.28 * Bump builder images to go 1.21.5 * feat: add metrics to show target size of every node group * deprecate unused node-autoprovisioning-enabled and max-autoprovisioned-node-group-count flags Signed-off-by: Prashant Rewar <108176843+prashantrewar@users.noreply.github.com> * fix(hetzner): insufficient nodes when boot fails The Hetzner Cloud API returns "Actions" for anything asynchronous that happens inside the backend. When creating a new server multiple actions are returned: `create_server`, `start_server`, `attach_to_network` (if set). Our current code waits for the `create_server` and if it fails, it makes sure to delete the server so cluster-autoscaler can create a new one immediately to provide the required capacity. If one of the "follow up" actions fails though, we do not handle this. This causes issues when the server for whatever reason did not start properly on the first try, as then the customer has a shutdown server, is paying for it, but does not receive the additional capacity for their Kubernetes cluster. This commit fixes the bug, by awaiting all actions returned by the create server API call, and deleting the server if any of them fail. * Add VSCode workspace files to .gitignore * Remove vpa/builder and switch dependabot updates to component Dockerfiles * fix: updated readme for hetzner cloud provider * Add error details to autoscaling backoff. Change-Id: I3b5c62ba13c2e048ce2d7170016af07182c11eee * Make backoff.Status.ErrorInfo non-pointer. Change-Id: I1f812d4d6f42db97670ef7304fc0e895c837a13b * allow specifing grpc timeout rather than hardcoded 5 seconds Signed-off-by: lizhen <lizhen@outlook.jp> * [GCE] Support paginated instance listing * azure: fix chart bugs after AKS vmType deprecation Signed-off-by: Jack Francis <jackfrancis@gmail.com> * Update VPA release README to reference 1.X VPA versions. * implement priority based evictor and refactor drain logic * Update dependencies to kubernetes 1.29.0 * [civo] Add Gpu count to node template Signed-off-by: Vishal Anarase <vishalanarse11@gmail.com> (cherry picked from commit 8703ff9) * Restore flags for setting QPS limit in CA Partially undo kubernetes#6274. I noticed that with this change CA get rate limited and slows down significantly (especially during large scale downs). * Pass Burst and QPS client params to capi k8s clients * Dependency update for CA 1.29.1 * feat: support `--scale-down-delay-after-*` per nodegroup Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: update scale down status after every scale up - move scaledown delay status to cluster state/registry - enable scale down if `ScaleDownDelayTypeLocal` is enabled - add new funcs on cluster state to get and update scale down delay status - use timestamp instead of booleans to track scale down delay status Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use existing fields on clusterstate - uses `scaleUpRequests`, `scaleDownRequests` and `scaleUpFailures` instead of `ScaleUpDelayStatus` - changed the above existing fields a little to make them more convenient for use - moved initializing scale down delay processor to static autoscaler (because clusterstate is not available in main.go) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove note saying only `scale-down-after-add` is supported - because we are supporting all the flags Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: evaluate `scaleDownInCooldown` the old way only if `ScaleDownDelayTypeLocal` is set to `false` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove line saying `--scale-down-delay-type-local` is only supported for `--scale-down-delay-after-add` - because it is not true anymore - we are supporting all `--scale-down-delay-after-*` flags per nodegroup Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix clusterstate tests failing Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: move back initializing processors logic to from static autoscaler to main - we don't want to initialize processors in static autoscaler because anyone implementing an alternative to static_autoscaler has to initialize the processors - and initializing specific processors is making static autoscaler aware of an implementation detail which might not be the best practice Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: revert changes related to `clusterstate` - since I am going with observer pattern Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add observer interface for state of scaling - to implement observer pattern for tracking state of scale up/downs (as opposed to using clusterstate to do the same) - refactor `ScaleDownCandidatesDelayProcessor` to use fields from the new observer Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove params passed to `clearScaleUpFailures` - not needed anymore Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: revert clusterstate tests - approach has changed - I am not making any changes in clusterstate now Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: add accidentally deleted lines for clusterstate test Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: implement `Add` fn for scale state observer - to easily add new observers - re-word comments - remove redundant params from `NewDefaultScaleDownCandidatesProcessor` Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: CI complaining because no comments on fn definitions Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: initialize parent `ScaleDownCandidatesProcessor` - instead of `ScaleDownCandidatesSortingProcessor` and `ScaleDownCandidatesDelayProcessor` separately Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: add scale state notifier to list of default processors - initialize processors for `NewDefaultScaleDownCandidatesProcessor` outside and pass them to the fn - this allows more flexibility Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: add observer interface - create a separate observer directory - implement `RegisterScaleUp` function in the clusterstate - TODO: resolve syntax errors Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: use `scaleStateNotifier` in place of `clusterstate` - delete leftover `scale_stateA_observer.go` (new one is already present in `observers` directory) - register `clustertstate` with `scaleStateNotifier` - use `Register` instead of `Add` function in `scaleStateNotifier` - fix `go build` - wip: fixing tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix syntax errors - add utils package `pointers` for converting `time` to pointer (without having to initialize a new variable) Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip track scale down failures along with scale up failures - I was tracking scale up failures but not scale down failures - fix copyright year 2017 -> 2023 for the new `pointers` package Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: register failed scale down with scale state notifier - wip writing tests for `scale_down_candidates_delay_processor` - fix CI lint errors - remove test file for `scale_down_candidates_processor` (there is not much to test as of now) Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: wip tests for `ScaleDownCandidatesDelayProcessor` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add unit tests for `ScaleDownCandidatesDelayProcessor` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: don't track scale up failures in `ScaleDownCandidatesDelayProcessor` - not needed Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: better doc comments for `TestGetScaleDownCandidates` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: don't ignore error in `NGChangeObserver` - return it instead and let the caller decide what to do with it Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: change pointers to values in `NGChangeObserver` interface - easier to work with - remove `expectedAddTime` param from `RegisterScaleUp` (not needed for now) - add tests for clusterstate's `RegisterScaleUp` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: conditions in `GetScaleDownCandidates` - set scale down in cool down if the number of scale down candidates is 0 Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: use `ng1` instead of `ng2` in existing test Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip static autoscaler tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: assign directly instead of using `sdProcessor` variable - variable is not needed Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: first working test for static autoscaler Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: continue working on static autoscaler tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: wip second static autoscaler test Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove `Println` used for debugging Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add static_autoscaler tests for scale down delay per nodegroup flags Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: rebase off the latest `master` - change scale state observer interface's `RegisterFailedScaleup` to reflect latest changes around clusterstate's `RegisterFailedScaleup` in `master` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix clusterstate test failing Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix failing orchestrator test Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `defaultScaleDownCandidatesProcessor` -> `combinedScaleDownCandidatesProcessor` - describes the processor better Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: replace `NGChangeObserver` -> `NodeGroupChangeObserver` - makes it easier to understand for someone not familiar with the codebase Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: reword code comment `after` -> `for which` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: don't return error from `RegisterScaleDown` - not needed as of now (no implementer function returns a non-nil error for this function) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: address review comments around ng change observer interface - change dir structure of nodegroup change observer package - stop returning errors wherever it is not needed in the nodegroup change observer interface - rename `NGChangeObserver` -> `NodeGroupChangeObserver` interface (makes it easier to understand) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: make nodegroupchange observer thread-safe Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add TODO to consider using multiple mutexes in nodegroupchange observer Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use `time.Now()` directly instead of assigning a variable to it Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: share code for checking if there was a recent scale-up/down/failure Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: convert `ScaleDownCandidatesDelayProcessor` into table tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: change scale state notifier's `Register()` -> `RegisterForNotifications()` - makes it easier to understand what the function does Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: replace scale state notifier `Register` -> `RegisterForNotifications` in test - to fix syntax errors since it is already renamed in the actual code Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove `clusterStateRegistry` from `delete_in_batch` tests - not needed anymore since we have `scaleStateNotifier` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: address PR review comments Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: add empty `RegisterFailedScaleDown` for clusterstate - fix syntax error in static autoscaler test Signed-off-by: vadasambar <surajrbanakar@gmail.com> (cherry picked from commit 5de49a1) * Backport kubernetes#6522 [CA] Bump go version into CA1.29 * Backport kubernetes#6491 and kubernetes#6494 [CA] Add informer argument to the CloudProviders builder into CA1.29 * Merge pull request kubernetes#6617 from ionos-cloud/update-ionos-sdk ionoscloud: Update ionos-cloud sdk-go and add metrics * CA - Update k/k vendor to 1.29.3 * [v1.29][Hetzner] Fix missing ephemeral storage definition This fixed requests for pods with ephemeral storage requests being denied due to insufficient ephemeral storage for the Hetzner provider. Backport of kubernetes#6574 to `v1.29` branch. * Use cache to track vms pools * fx * Add UTs * Fx boilder plate header * Add const * Rename vmsPoolSet * [v1.29][Hetzner] Fix Autoscaling for worker nodes with invalid ProviderID This change fixes a bug that arises when the user's cluster includes worker nodes not from Hetzner Cloud, such as a Hetzner Dedicated server or any server resource other than Hetzner. It also corrects the behavior when a server has been physically deleted from Hetzner Cloud. Signed-off-by: Maksim Paskal <paskal.maksim@gmail.com> * [v1.29] fix(hetzner): hostname label is not considered The Node Group info we currently return does not include the `kubernetes.io/hostname` label, which is usually set on every node. This causes issues when the user has an unscheduled pod with a `topologySpreadConstraint` on `topologyKey: kubernetes.io/hostname`. cluster-autoscaler is unable to fulfill this constraint and does not scale up any of the node groups. Related to kubernetes#6715 * Remove shadow err variable in deleteCreatedNodesWithErros func * fix: scale up broken for providers not implementing NodeGroup.GetOptions() Properly handle calls to `NodeGroup.GetOptions()` that return `cloudprovider.ErrNotImplemented` in the scale up path. * Update k/k vendor to 1.29.5 for CA 1.29 * Rebase * Fx gomock * Rename ARM_BASE_URL * Backport kubernetes#6528 [CA] Fix expectedToRegister to respect instances with nil status into CA1.29 * Backport kubernetes#6750 [CA] fix(hetzner): missing error return in scale up/down into CA1.29 * PR#6911 Backport for 1.29: Fix/aws asg unsafe decommission kubernetes#5829 * CA - 1.29.4 Pre-release AWS Instance Types Update * Update vendor to use k8s 1.29.6 * adjust logs --------- Signed-off-by: Jonathan Raymond <jonwraymond@gmail.com> Signed-off-by: Ayush Rangwala <ayush.rangwala@gmail.com> Signed-off-by: Thomas Stadler <thomas.stadler@whizus.com> Signed-off-by: Amir Alavi <amiralavi7@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Matt Dainty <matt@bodgit-n-scarper.com> Signed-off-by: Cyrill Troxler <cyrill@nine.ch> Signed-off-by: vadasambar <surajrbanakar@gmail.com> Signed-off-by: Jack Francis <jackfrancis@gmail.com> Signed-off-by: Vishal Anarse <vishalanarse11@gmail.com> Signed-off-by: Prashant Rewar <108176843+prashantrewar@users.noreply.github.com> Signed-off-by: lizhen <lizhen@outlook.jp> Signed-off-by: Maksim Paskal <paskal.maksim@gmail.com> Co-authored-by: Mathieu Bruneau <brunemat@amazon.com> Co-authored-by: Artem Minyaylov <artemvmin@google.com> Co-authored-by: Kubernetes Prow Robot <k8s-ci-robot@users.noreply.github.com> Co-authored-by: Dumlu Timuralp <34840364+dumlutimuralp@users.noreply.github.com> Co-authored-by: Hakan Bostan <hbostan@google.com> Co-authored-by: Rich Gowman <rgowman@google.com> Co-authored-by: Daniel Gutowski <danielgutowski@google.com> Co-authored-by: mikutas <23391543+mikutas@users.noreply.github.com> Co-authored-by: Jonathan Raymond <jonwraymond@gmail.com> Co-authored-by: Johnnie Ho <johnnieho89@gmail.com> Co-authored-by: aleskandro <aleskandro@redhat.com> Co-authored-by: Kuba Tużnik <jtuznik@google.com> Co-authored-by: lisenet <tomas@lisenet.com> Co-authored-by: Piotr Wrótniak <piotrwrotniak@google.com> Co-authored-by: Ayush Rangwala <ayush.rangwala@gmail.com> Co-authored-by: Dixita Narang <ndixita@google.com> Co-authored-by: Artur Żyliński <azylinski@google.com> Co-authored-by: Alexandros Afentoulis <alexandros.afentoulis@exoscale.ch> Co-authored-by: jw-maynard <jmaynard@playq.net> Co-authored-by: xiaoqing <xiaoqingnb@gmail.com> Co-authored-by: Thomas Stadler <thomas.stadler@whizus.com> Co-authored-by: Marco Voelz <marco.voelz@sap.com> Co-authored-by: Aleksandra Gacek <algacek@google.com> Co-authored-by: Luis Ramirez <luis.ramirez.rivera92@gmail.com> Co-authored-by: piotrwrotniak <91665466+piotrwrotniak@users.noreply.github.com> Co-authored-by: Amir Alavi <amiralavi7@gmail.com> Co-authored-by: Michael Grosser <michael@grosser.it> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: shapirus <me@shapirus.net> Co-authored-by: Guy Templeton <guyjtempleton@googlemail.com> Co-authored-by: Mads Hartmann <mads379@gmail.com> Co-authored-by: Thomas Güttler <info@thomas-guettler.de> Co-authored-by: Prachi Gandhi <prachigandhi@microsoft.com> Co-authored-by: Prachi Gandhi <73401862+gandhipr@users.noreply.github.com> Co-authored-by: Mike Tougeron <tougeron@adobe.com> Co-authored-by: Matt Dainty <matt@bodgit-n-scarper.com> Co-authored-by: Guo Peng <370090914@qq.com> Co-authored-by: Alex Serbul <22218473+AlexanderSerbul@users.noreply.github.com> Co-authored-by: Cyrill Troxler <cyrill@nine.ch> Co-authored-by: alexanderConstantinescu <alexander.constantinescu88@gmail.com> Co-authored-by: Brydon Cheyney <bcheyney@nerdwallet.com> Co-authored-by: Julian Tölle <julian.toelle@hetzner-cloud.de> Co-authored-by: Mahmoud Atwa <mahmoudatwa@google.com> Co-authored-by: Yaroslava Serdiuk <yaroslava@google.com> Co-authored-by: vadasambar <surajrbanakar@gmail.com> Co-authored-by: Jack Francis <jackfrancis@gmail.com> Co-authored-by: Vishal Anarse <vishalanarse11@gmail.com> Co-authored-by: qianlei.qianl <qianlei.qianl@bytedance.com> Co-authored-by: Andrea Scarpino <andrea@scarpino.dev> Co-authored-by: Prashant Rewar <108176843+prashantrewar@users.noreply.github.com> Co-authored-by: Jont828 <jt572@cornell.edu> Co-authored-by: Pascal <pascal@pascal.sh> Co-authored-by: Walid Ghallab <walidghallab@google.com> Co-authored-by: lizhen <lizhen@outlook.jp> Co-authored-by: Daniel Kłobuszewski <danielmk@google.com> Co-authored-by: Luiz Antonio <luizaoj@google.com> Co-authored-by: damikag <damika@google.com> Co-authored-by: Maciek Pytel <maciekpytel@google.com> Co-authored-by: Joachim Bartosik <jbartosik@google.com> Co-authored-by: Kyle Weaver <kweaver@mux.com> Co-authored-by: shubham82 <shubham.kuchhal@india.nec.com> Co-authored-by: Kubernetes Prow Robot <20407524+k8s-ci-robot@users.noreply.github.com> Co-authored-by: wenxuanW <wenxuan0923@outlook.com> Co-authored-by: Maksim Paskal <paskal.maksim@gmail.com> Co-authored-by: Bartłomiej Wróblewski <bwroblewski@google.com> Co-authored-by: Krishna Sarabu <krishna.sarabu@gmail.com>

negz added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 17, 2022

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 17, 2022

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 17, 2022

k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Aug 17, 2022

negz mentioned this issue Aug 17, 2022

REQUEST: New membership for negz kubernetes/org#3623

Closed

9 tasks

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 18, 2022

negz mentioned this issue Aug 27, 2022

Encourage Kubernetes clients to bump discovery rate limits to 300+ rps crossplane/crossplane#3272

Closed

ecordell mentioned this issue Aug 29, 2022

Add helper to remove client-side rate limiting authzed/controller-idioms#5

Closed

JohnRusk mentioned this issue Nov 1, 2022

kubectl does not have a way to adjust its client-side rate limiting #110606

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 16, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 16, 2022

pablochacin mentioned this issue Nov 21, 2022

When multiple pods are selected: Requests are being client-side throttled grafana/xk6-disruptor#44

Closed

wallrj mentioned this issue Dec 20, 2022

Design: reduce cert-manager controller's memory consumption cert-manager/cert-manager#5639

Merged

azylinski added a commit to azylinski/autoscaler that referenced this issue Nov 14, 2023

Cleanup: Remove separate client for k8s events

747d0b9

Remove RateLimiting options - replay on APF for apiserver protection. Details: kubernetes/kubernetes#111880

azylinski mentioned this issue Nov 14, 2023

Cleanup: Remove separate client for k8s events kubernetes/autoscaler#6274

Merged

Aetf added a commit to Aetf/kluster-code that referenced this issue Mar 24, 2024

pulumi: increase kube client rate limit

b42d9d5

The default value is too small, and given kubernetes/kubernetes#111880, it is not really needed.

wallrj mentioned this issue Apr 9, 2024

Allow client-side rate-limiting to be disabled cert-manager/cert-manager#6890

Closed

EronWright mentioned this issue Apr 11, 2024

Kubernetes client rate-limiting pulumi/pulumi-kubernetes#2942

Open

voelzmo mentioned this issue Jun 3, 2024

Re-work internal health check between vpa-updater and vpa-admission-controller kubernetes/autoscaler#6884

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable client-side rate-limiting when AP&F is enabled #111880

Disable client-side rate-limiting when AP&F is enabled #111880

negz commented Aug 17, 2022 •

edited

Loading

negz commented Aug 17, 2022

negz commented Aug 17, 2022

negz commented Aug 17, 2022

MadhavJivrajani commented Aug 17, 2022

MikeSpreitzer commented Aug 17, 2022

MikeSpreitzer commented Aug 17, 2022 •

edited

Loading

negz commented Aug 17, 2022

MikeSpreitzer commented Aug 17, 2022

karlkfi commented Aug 17, 2022

negz commented Aug 17, 2022

negz commented Aug 17, 2022

karlkfi commented Aug 17, 2022

MikeSpreitzer commented Aug 18, 2022 •

edited

Loading

leilajal commented Aug 18, 2022

k8s-triage-robot commented Nov 16, 2022

negz commented Nov 16, 2022

taliastocks commented Dec 21, 2023

Jefftree commented Dec 27, 2023

karlkfi commented Dec 28, 2023 •

edited

Loading

negz commented Jan 2, 2024

bboreham commented Jan 19, 2024

Disable client-side rate-limiting when AP&F is enabled #111880

Disable client-side rate-limiting when AP&F is enabled #111880

Comments

negz commented Aug 17, 2022 • edited Loading

What would you like to be added?

Why is this needed?

negz commented Aug 17, 2022

negz commented Aug 17, 2022

negz commented Aug 17, 2022

MadhavJivrajani commented Aug 17, 2022

MikeSpreitzer commented Aug 17, 2022

MikeSpreitzer commented Aug 17, 2022 • edited Loading

negz commented Aug 17, 2022

MikeSpreitzer commented Aug 17, 2022

karlkfi commented Aug 17, 2022

negz commented Aug 17, 2022

negz commented Aug 17, 2022

karlkfi commented Aug 17, 2022

MikeSpreitzer commented Aug 18, 2022 • edited Loading

leilajal commented Aug 18, 2022

k8s-triage-robot commented Nov 16, 2022

negz commented Nov 16, 2022

taliastocks commented Dec 21, 2023

Jefftree commented Dec 27, 2023

karlkfi commented Dec 28, 2023 • edited Loading

negz commented Jan 2, 2024

bboreham commented Jan 19, 2024

negz commented Aug 17, 2022 •

edited

Loading

MikeSpreitzer commented Aug 17, 2022 •

edited

Loading

MikeSpreitzer commented Aug 18, 2022 •

edited

Loading

karlkfi commented Dec 28, 2023 •

edited

Loading