Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter support for cluster autoscaling #2712

Open
alexisbel1 opened this issue Dec 27, 2021 · 104 comments
Open

Karpenter support for cluster autoscaling #2712

alexisbel1 opened this issue Dec 27, 2021 · 104 comments
Assignees
Labels
cluster-autoscaler feature-request Requested Features Scale and Performance Use this for any AKS scale or performance related issue

Comments

@alexisbel1
Copy link

alexisbel1 commented Dec 27, 2021

Karpenter is an open-source node provisioning project built for Kubernetes. Its goal is to improve the efficiency and cost of running workloads on Kubernetes clusters. Karpenter works by:

  • Watching for pods that the Kubernetes scheduler has marked as unschedulable
  • Evaluating scheduling constraints (resource requests, nodeselectors, affinities, tolerations, and topology spread constraints) requested by the pods
  • Provisioning nodes that meet the requirements of the pods
  • Scheduling the pods to run on the new nodes
  • Removing the nodes when the nodes are no longer needed

Karpenter has many advantages over cluster autoscaler. One prerequisite would be that AKS can manage multiple instance types without defining multiple node pools.

Currently the only cloud provider which support Karpenter is AWS.

It would be awesome to have AKS support it.

@ghost ghost added the triage label Dec 27, 2021
@ghost
Copy link

ghost commented Dec 27, 2021

Hi alexisbel1, AKS bot here 👋
Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such:

  1. If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster.
  2. Please abide by the AKS repo Guidelines and Code of Conduct.
  3. If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics?
  4. Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS.
  5. Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue.
  6. If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

@ghost ghost added the action-required label Dec 30, 2021
@ghost
Copy link

ghost commented Dec 30, 2021

Triage required from @Azure/aks-pm

@ghost
Copy link

ghost commented Jan 4, 2022

Action required from @Azure/aks-pm

@ghost ghost added the Needs Attention 👋 Issues needs attention/assignee/owner label Jan 4, 2022
@ghost ghost removed triage action-required Needs Attention 👋 Issues needs attention/assignee/owner labels Jan 10, 2022
@aido123
Copy link

aido123 commented Jan 16, 2022

+1

1 similar comment
@AstritCepele
Copy link

+1

@nahum-litvin-hs
Copy link

+1 would also love to see this.

@apton-sooraj
Copy link

+1

2 similar comments
@tarun-asthana
Copy link

+1

@trash-anger
Copy link

+1

@laport-n
Copy link

laport-n commented Apr 7, 2022

+1 !

@palma21
Copy link
Member

palma21 commented Apr 26, 2022

Karpenter is an open-source node provisioning project built for Kubernetes.

Looking at the project, it doesn't seem generic enough to be used across all of kubernetes, seems to only work with AWS.

One prerequisite would be that AKS can manage multiple instance types without defining multiple node pools.

Unfortunately, VMSS today only supports one type, but there is work being done by the VMSS team to allow for this.

All the bullet points you mentioned are in scope for cluster autoscaler, but you mentioned Karpenter has many advantages over CA. Could you be a bit more specific on those? What things would you like to accomplish on AKS

@alexisbel1
Copy link
Author

All the bullet points you mentioned are in scope for cluster autoscaler, but you mentioned Karpenter has many advantages over CA. Could you be a bit more specific on those? What things would you like to accomplish on AKS

The main advantage over CA is the ability to provision new VM types based on workload requirements (resources, taints...). CA will only up and down VM of the same type in a VMSS (that why it would require to allow multiple VM types in the same node group). In case, the VM type does not match workload requirements (e.g. GPU), the pod won't be able to start.

@guettli
Copy link

guettli commented Apr 28, 2022

Cluster API supports several providers: https://cluster-api.sigs.k8s.io/

@wasabii
Copy link

wasabii commented May 5, 2022

For some context for others here:

Karpenter is new. And it was written by Amazon. However it's intended that additional cloud providers be added to it, just like cluster auto scaler. The source is open, and it's waiting for engineers to contribute.

It would ideally be the task of the Azure/AKS teams to provide the necessary resources to implement the Azure provider.

What differentiates it from the cluster auto scaler is it has no concept of Node Groups. There is no need to allocate a classification of a Node Group up front. Instead, it examines the requirements, and expects the cloud provider to be able to allocate exactly what it needs, from the smorgasbord of offerings the cloud provider might have.

That means if you need a machine with 32GB of ram, it'll go make one for you. If you need a spot instance it'll go make one for you. If it needs a node on AZ 2, it'll go make one for you. It doesn't require you to define all of the possible classes up front. Or it can consult the cloud provider for the most cost effective option that meets the requirements at the moment.

This does present some architectural challenges as to how this would be surfaced in AKS. Would it just go and create VMs one by one? Would it still use a VMSS, but require arbitrary resource request support within a VMSS? How will network topology be defined in the former? Etc.

But it is a much more extensible approach than the way CA is built. At least for cloud providers. Azure obviously has hundreds of VM family, series, size, and disk capabilities, operating systems, etc, and the cartesian product of them all is massive.

@ellistarn
Copy link

ellistarn commented Jun 8, 2022

👋 I lead the Karpenter project. We'd love to collaborate on additional cloud providers and have done our best to factor out a simple and extensible cloud provider API to minimize the effort for other providers to adopt. If you're interested in chatting about the project, feel free to join in at our working group.

@pavneeta pavneeta added the Scale and Performance Use this for any AKS scale or performance related issue label Jul 18, 2022
@dkbhadeshiya
Copy link

This would really be an interesting feature to support Azure/AKS

@seyal84
Copy link

seyal84 commented Sep 26, 2022

interesting and following this for future. Cannot wait to test this in AKS, whenever this feature is supported.

@markthebault
Copy link

+1

@JungBin-Eom
Copy link

+1 Really interesting feature👍

@SudhamshBachu
Copy link

+1

1 similar comment
@dunefro
Copy link

dunefro commented Aug 17, 2023

+1

@bplasmeijer
Copy link

Choosing between smaller and larger node types in Kubernetes depends on factors like workload needs, costs, and efficiency.

Here are the pros and cons:

Smaller Nodes:

Pros:

  1. Cost-Efficiency: Lower expenses for lightweight workloads.
  2. Resource Use: Efficient resource allocation for less demanding tasks.
  3. Scaling: Easier fine-tuned scaling.
  4. Isolation: Better workload separation.

Cons:

  1. Limited Resources: Potential performance issues for demanding tasks.
  2. Complexity: More management for numerous small nodes.
  3. Network Impact: Higher network load.

Bigger Nodes:

Pros:

  1. Performance: Handles heavy workloads effectively.
  2. Simplicity: Easier management.
  3. Consolidation: Efficiently accommodates multiple smaller tasks.

Cons:

  1. Costly: Higher expenses, especially if resources need to be fully used.
  2. Resource Use: Wastage if not optimized.
  3. Scalability: Overprovisioning risks.

A mix of small and large nodes could suit diverse workloads. Kubernetes features like resource management aid practical usage, regardless of node size. Base decisions on workload, performance, cost, and operations, with regular monitoring for optimization.

⚠️Save cost and optimize the compute workload, and not have wasted compute.

@asahnovskiy-deloitte
Copy link

+1

1 similar comment
@geekyshameem
Copy link

+1

@ppodevlabs
Copy link

I am told that this was announced accidentally, but should be coming out in a week or two.

On Thu 20 Apr 2023, 11:48 Pierrick Brossin, @.> wrote: Google cached it https://webcache.googleusercontent.com/search?q=cache:Y_-RP7rnjqAJ:https://azure.microsoft.com/en-us/updates/karpenter-support-in-aks/, hopefully something's being worked on behind the scene :-) — Reply to this email directly, view it on GitHub <#2712 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE5GOHJ7DD77O4COXYYTEHDXCEA6TANCNFSM5K27BV6Q . You are receiving this because you are subscribed to this thread.Message ID: @.>

well, it is taking a bit longer.... Any news?

@bplasmeijer
Copy link

Any update @palma21???

@palma21
Copy link
Member

palma21 commented Sep 7, 2023

All updates for this are now on/depending of: kubernetes/org#4258

AKS does have an autoprovision item that will be in preview before the end of the year.
#2442

@lupass93
Copy link

+1

2 similar comments
@ronmegini
Copy link

+1

@sravanakinapally
Copy link

+1

@palma21
Copy link
Member

palma21 commented Nov 7, 2023

https://twitter.com/jorgefpalma/status/1721944779011858729

https://github.com/Azure/karpenter

@stevehipwell
Copy link

@palma21 I don't see any docs for how to actually use Karpenter in AKS? 👀

@PixelRobots
Copy link
Collaborator

@palma21 I don't see any docs for how to actually use Karpenter in AKS? 👀

Currently only the open source option that's linked above is available. Hopefully a built-in option such as an AKS add-on will be out soon.

You can check out this blog post on how to use the open source method.

https://massimocrippa.com/blog/f/karpenter-provider-for-aks

@stevehipwell
Copy link

Thanks for the link @PixelRobots. I'm all for deploying my own K8s components, but I don't think we're really at the stage for an announcement given that there isn't an official build of the image yet. Or any official docs as to how it works like there is for EKS.

I'd also like to know how AKS plans to maintain their provisioner without keeping a fork of the whole code base?

@palma21
Copy link
Member

palma21 commented Nov 13, 2023

Not yet, we just announced the OSS provider. The add-on and respective docs will come soon.

The provider code is cloud specific, core code is shared by all providers. We're not planning to make any forking of the core code (same as with all OSS projects we use).

@stevehipwell
Copy link

@palma21 I'm not looking for an add-on, but I'd like an OCI image and some docs explaining how the AKS implementation works to go with the announcement. Last time I checked the repo seems to contain the docs for the AWS version and point to that site.

@justindavies
Copy link
Contributor

@stevehipwell The documentation for the AKS provider will be released with the add-on, and the Open Source repository will be updated with a link to that documentation for customers to reference, as well as updating the AKS specifics in the repo itself. As Jorge mentioned, the repository we announced contains the AKS Karpenter provider, and will be the home for any work we do that is provider specific

@0dragosh
Copy link

@justindavies is there a roadmap we can all follow so we prevent questions like "what's the ETA" ?

@justindavies
Copy link
Contributor

Good morning all, just to let everyone know we announced Node Autoprovision earlier today: https://learn.microsoft.com/en-gb/azure/aks/node-autoprovision?tabs=azure-cli

@bplasmeijer
Copy link

@justindavies any update on windows containers?

@tppalani
Copy link

Hi All

just want to know the update is that azure will support karpenter autoscaler or still it is in discussions state?

@zioproto
Copy link
Contributor

This issue should probably be closed. Please check the following links:

@bplasmeijer
Copy link

bplasmeijer commented May 20, 2024 via email

@bplasmeijer
Copy link

@EppO
Copy link

EppO commented Jul 23, 2024

The only network configuration allowed is Azure CNI Overlay with Powered by Cilium.

This is a strong requirement too, I hope it will cover more possible AKS configurations before it reaches GA.

@ashutoshrathore
Copy link

When can we expect it to be GA? We are using linux nodes and would like to know if we can use it in production with MS support?

@zioproto
Copy link
Contributor

zioproto commented Sep 9, 2024

When can we expect it to be GA? We are using linux nodes and would like to know if we can use it in production with MS support?

Hello if you are asking about the GA of Node autoprovisioning the correct GitHub issue to follow is:
#2442

I believe the intention of this issue was to track the possibility of using the open source Karpenter with AKS.

As I already commented, this issue should probably be closed ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster-autoscaler feature-request Requested Features Scale and Performance Use this for any AKS scale or performance related issue
Projects
Status: Public Preview (Shipped & Improving)
Development

No branches or pull requests