Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Autoscaler CAPI provider should support scaling to and from zero nodes #3150

Closed
elmiko opened this issue May 21, 2020 · 42 comments · Fixed by #4840
Closed

Cluster Autoscaler CAPI provider should support scaling to and from zero nodes #3150

elmiko opened this issue May 21, 2020 · 42 comments · Fixed by #4840
Labels
area/cluster-autoscaler area/provider/cluster-api Issues or PRs related to Cluster API provider

Comments

@elmiko
Copy link
Contributor

elmiko commented May 21, 2020

As a user I would like the ability to have my MachineSets and MachineDeployments scale to and from zero replicas. I should be able to set a minimum size of 0 for a Machine[Set|Deployment] and have the autoscaler take the appropriate actions.

This issue is CAPI provider specific, and will require some modifications to the individual CAPI providers before it could be merged in the autoscaler code.

@elmiko
Copy link
Contributor Author

elmiko commented May 21, 2020

/area provider/cluster-api

@k8s-ci-robot k8s-ci-robot added the area/provider/cluster-api Issues or PRs related to Cluster API provider label May 21, 2020
@seh
Copy link

seh commented May 27, 2020

How will the autoscaler determine which labels and taints to expect on nodes for its scheduling simulation? I see the taints may be available in the kubeadm NodeRegistrationOptions type.

@elmiko
Copy link
Contributor Author

elmiko commented May 28, 2020

@seh, if i understand your question correctly this is information is handled through the labels and taints on the MachineSets and MachineDeployments. when these resources are set to a minimum size of 0 and the autoscaler has removed all the Machines and Nodes, the MachineSets and MachineDeployments contain labels and taints which are used during the scale up process. the labels and taints will be applied to the new Node resources as they are created.

@seh
Copy link

seh commented May 28, 2020

Before asking that question, when I went looking at the newest MachineSet type definition, I didn't see anything there about taints. Drilling down further into MachineSpec, it's not there either.

The only place I could find them was in the kubeadm NodeRegistrationOptions type. That's why I asked where you'll find the taints. Did I miss a pertinent field here?

@elmiko
Copy link
Contributor Author

elmiko commented May 28, 2020

Did I miss a pertinent field here?

no, i don't think you missed something, i think i may have missed something ;)

i have been working from a branch of the cluster-api code to test this behavior locally and with openshift. to make this work in our branch, we have the Taints persisted through at the MachineSpec level. i think there will need to be some work done in the cluster-api project to expose this functionality, or at least a little deeper research.

there are other changes that will need to happen in CAPI as well, mainly around saving information about cpu/memory/gpu as well. your point about the taints is well placed though, i will add this to the list of changes.

@seh
Copy link

seh commented May 28, 2020

For the machine resources, I figured that we'd do something like dive down to figure out the cloud provider and machine/instance type, and then consult the catalogs available elsewhere within the cluster autoscaler. I'm most familiar with AWS, and for that provider there used to be a static (generated) catalog, but now we fetch it dynamically via the AWS API when the program starts. With that catalog, you can learn of the machine's promised capabilities.

Perhaps, though, in the interest of eliminating dependencies among providers, the Cluster API provider would be blind to that information, which is would be an unfortunate loss.

@elmiko
Copy link
Contributor Author

elmiko commented May 28, 2020

for the machine/instance resources, the solution i am working from currently is that the individual providers on the CAPI side will populate annotations in the Machine[Set|Deployment] that instruct on the cpu, memory, gpu, etc.

the method i am currently using has lookup tables for each provider (contained within the provider code) to assist in creating the resource requirements. i think having these values be dynamically populated by the CAPI side of things would certainly be worth looking into. ultimately though, the idea would be for each provider to own their implementation of the resource requirements, with a group of standard annotations that the autoscaler can use to assist in creating the machines for that group.

the information does come from the CAPI providers though, not from the autoscaler providers.

@seh
Copy link

seh commented May 28, 2020

Understood. So long as it's all accurate and not too hard to maintain, that sounds fine.

What we ran into with the AWS provider for the autoscaler was that the catalog would fall out of step, which required generating fresh code, releasing a new autoscaler version, and then deploying that new container image version into clusters. AWS was coming out with new instance types often enough that that whole process felt too onerous. It seems that these new instance types come in waves. It's hard to balance the threat of falling out of data with the threat of the catalog fetching and parsing failing at run time.

@elmiko
Copy link
Contributor Author

elmiko commented May 28, 2020

that's an excellent point about the catalog falling out of step. if i understand the provider implementations though CAPI properly, and i might not ;) , we are using values for cpu, memory, etc, that the individual CAPI providers then turn into actual instance information at the cloud provider layer. so, in theory, this could be a call to the CAPI provider at creation time, eg. "give me a Machine that has X cpu slices, Y ram, and Z gpus" then the CAPI provider could either use a lookup table if appropriate or make some dynamic call to the cloud provider api.

edit: added some context to the overloaded "provider" terms

@seh
Copy link

seh commented Jun 10, 2020

to make this work in our branch, we have the Taints persisted through at the MachineSpec level. i think there will need to be some work done in the cluster-api project to expose this functionality, or at least a little deeper research.

Are there any open CAPI issues about this gap? Do you know if anyone is working on exposing the node taints and labels there? (Perhaps we can already get the labels.)

@elmiko
Copy link
Contributor Author

elmiko commented Jun 10, 2020

Are there any open CAPI issues about this gap? Do you know if anyone is working on exposing the node taints and labels there? (Perhaps we can already get the labels.)

i do not think issues have been opened on the CAPI side yet, there will need to be some discussion there about passing information about the node sizes through the CAPI resources. i am working from a proof of concept that has this working for aws, gcp, and azure, in which we use annotations for passing this information.

ideally i would like to contribute these patches back to the CAPI project, and bring the associated changes here as well, but i think we need to have a discussion on the CAPI side about this as it will require changes to several repos and some agreement about the method for passing information.

and we haven't even touched on the taints yet ;)

@seh
Copy link

seh commented Jun 10, 2020

i think we need to have a discussion on the CAPI side about this as it will require changes to several repos and some agreement about the method for passing information.

Would you mind if I bring this up for discussion in the "cluster-api" Slack channel? I'd like to get a feel for how much work and resistance lies ahead, as I don't think we can adopt the cluster autoscaler with CAPI until we close this gap.

@elmiko
Copy link
Contributor Author

elmiko commented Jun 10, 2020

Would you mind if I bring this up for discussion in the "cluster-api" Slack channel? I'd like to get a feel for how much work and resistance lies ahead, as I don't think we can adopt the cluster autoscaler with CAPI until we close this gap.

please do!

if you'd like, we can bring this up during the weekly meeting today as well?

@elmiko
Copy link
Contributor Author

elmiko commented Jun 10, 2020

@seh just wanted to let you know that we talked about this at the CAPI meeting today, i don't think we have consensus yet but i didn't hear any hard objections. i think the next steps will be to do a little research around some other approaches to gather the cpu/mem/gpu requirements, and then create an enhancement proposal to discuss with the CAPI team.

CAPI meeting minutes 2020-06-10

@seh
Copy link

seh commented Jun 10, 2020

That's great to hear. I'm sorry I wasn't able to attend the meeting today. I do see the topic covered in the agenda/minutes, though, so thank you for bringing it up.

I don't know yet what I can do to help make progress on this front. I have experience with kubeadm and the cluster autoscaler, but little with CAPI and CAPA so far. If you'd like review or help with the KEP, please let me know.

@elmiko
Copy link
Contributor Author

elmiko commented Jun 10, 2020

I don't know yet what I can do to help make progress on this front. I have experience with kubeadm and the cluster autoscaler, but little with CAPI and CAPA so far. If you'd like review or help with the KEP, please let me know.

i think the next steps will be to make a formal proposal to the CAPI group for getting this change into their releases, and then coordinating the autoscaler changes. i'm happy to CC you on any issues that come up around this, and perhaps we can work to get them merged. if you are interested in getting more involved with the CAPI provider code, i'm sure we could collaborate on getting the necessary changes in place.

@seh
Copy link

seh commented Jun 17, 2020

I brought up some of these questions in the "cluster-api" Slack channel. See kubernetes-sigs/cluster-api#2461 for an overlapping request.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 15, 2020
@elmiko
Copy link
Contributor Author

elmiko commented Sep 15, 2020

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 15, 2020
@unixfox
Copy link

unixfox commented Dec 3, 2020

Hello,

Sorry for the noise but I just wanted to say that I'm also interested into this issue for mostly deploying temporary workloads like Minecraft servers, coding environments (like GitHub codespaces) and more.

This would also close the gap even further for the features available between self-hosted autoscaler and autoscaler from managed Kubernetes solutions like DigitalOcean.
For instance thanks to some projects like machine-controller that implements cluster-api, it's possible to use our own autoscaler on DigitalOcean and even on unsupported cloud providers like Scaleway, Hetzner, Linode and more.

@elmiko
Copy link
Contributor Author

elmiko commented Dec 3, 2020

@unixfox just by means of an update, i have been working on a proof of concept for scaling from zero with capi. it's been going slower than i expected, but i feel we have good consensus about the initial implementation and with any luck 🍀 i should have something to show in early january.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2021
@unixfox
Copy link

unixfox commented Mar 3, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 8, 2021
@unixfox
Copy link

unixfox commented Jun 8, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 8, 2021
@elmiko
Copy link
Contributor Author

elmiko commented Jun 8, 2021

thanks for the bump @unixfox , i continue to hack away on this. the design has changed slightly since the first round of work on the enhancement. i need to update the enhancement and would like to give a demo at an upcoming cluster-api meeting.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 6, 2021
@unixfox
Copy link

unixfox commented Sep 6, 2021

Not sure if I would mark this issue as fresh or not. I stopped having the need for cluster autoscaler with Cluster API, but it's still a cool feature that have a lot of potential when trying to use cluster autoscaler on "unsupported" cloud providers.

@elmiko
Copy link
Contributor Author

elmiko commented Sep 7, 2021

i am still working towards this issue. we almost have agreement on the cluster-api enhancement, and i think it will merge in the next few weeks. then i will post a PR for the implementation.

@unixfox sorry to hear that we weren't able to deliver this feature in a time that would be helpful to you. i do appreciate your support though =)

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 7, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2021
@elmiko
Copy link
Contributor Author

elmiko commented Dec 14, 2021

the upstream cluster-api community has approved the proposal for the scale-from-zero feature. i am in the process of writing a patch that will satisfy the proposal, and also updating the kubemark provider to work with scaling from zero. i imagine this work won't be done till january, hopefully we will have it in for the 1.24 release of the autoscaler.

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2021
@davidspek
Copy link

@elmiko Do you have a link to a PR we can follow?

@elmiko
Copy link
Contributor Author

elmiko commented Mar 11, 2022

@davidspek i am hoping to have the PR ready next week, you can follow my progress on this branch for now https://github.com/elmiko/kubernetes-autoscaler/tree/capi-scale-from-zero

i have it working, but i need to do some cleanups around the dynamic nature of the client, and also add some unit tests. there is a complicated problem to solve wherein we need the client to become aware of the machine template types after it has started watching machinedeployments/machinesets, so that we can accurately set up the informers to watch the templates. i have the basic mechanism working on my branch, i'm just trying to make the dynamic client better now.

@davidspek
Copy link

@elmiko Thanks for the info. I hope to have some time to test your changes soon. Do you maybe have a link to the Cluster API docs for infrastructure providers to support scale from 0? I haven’t been able to find that myself.

@elmiko
Copy link
Contributor Author

elmiko commented Apr 7, 2022

@davidspek my hope is that the enhancement[0] has enough details for a provider to implement scale from zero. if you find that there is detail lacking, please ping me as i would like to improve that doc =)

[0] https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210310-opt-in-autoscaling-from-zero.md

@davidspek
Copy link

@elmiko Thanks for the doc, I think that’ll likely answer most of my questions. Has that proposal already been accepted? Or more importantly, can this already be implemented in infrastructure providers without needing to change anything in the cluster api core library?

@elmiko
Copy link
Contributor Author

elmiko commented Apr 7, 2022

@davidspek yes it has been accepted, and no it should not require any changes in the core cluster-api.

i was able to implement scale from zero in the kubemark provider without modifying the core, you can see my PR here kubernetes-sigs/cluster-api-provider-kubemark#30

@davidspek
Copy link

@elmiko Awesome, thank you very much for all the info and quick responses.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 6, 2022
@elmiko
Copy link
Contributor Author

elmiko commented Jul 6, 2022

PR is currently under review for this, #4840
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler area/provider/cluster-api Issues or PRs related to Cluster API provider
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants