-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Karpenter support for prioritizing AWS Capacity Reservations #3042
Comments
We have something very similar explained here: https://karpenter.sh/preview/concepts/scheduling/#savings-plans-and-reserved-instances |
Ryan, While I appreciate this reserved instance approach, this won’t work with “targeted” capacity reservations because those need to be “pointed to” by the EC2 Fleet API. The current approach you mentioned also won’t factor in availability of capacity reservations by AZ even if the reservation is of type “open”. If I reserve 20 instances of c4.large each in 5 AZs, with a total of 100, those 100 instances can be launched by Karpenter in any AZ without regard for placing them evenly across each AZ. Similarly, if I had 10 reservations available across 4 AZs, there’s no mechanism for Karpenter to place them in those AZs accordingly. Let me know if I'm missing something. Thanks, |
your provisioner can specify which AZs to deploy to. Worst case, you could have a provisioner per AZ |
I understand that, but how do I know I have 10 reservations across 4 AZs? There's no way the provisioner would know those exist, you as the infrastructure engineer would have to know how many are in each AZ, and the number could fluctuate due to other workloads or components taking up those reserved slots. Why not let EC2 Fleet figure it out, instead of putting onus on the infrastructure engineers? And this still does not address targeted Capacity Reservations. If you have multiple workloads or components of a workload in the same account and have different reservations for each via reservation IDs, you are SOL. |
I can add some additions: |
I also need the ability to reference targeted on demand capacity requests. We use ODCRs for instance types that are capacity constrained where we need a guarantee of being able to launch those instances. |
I think it make sense to support capacity reservation but I am not sure it make sense to go down a priority within the same Node Pool. I think Karpenter has a solve for this already with weights. So you would have your ODCR NodePool at a higher weight and if that fails you would fall into the next node pool (maybe spot) and so on. I think its difficult to support the prioritization because ODCR is tied to a launch template, which Karpenter creates in the back. In order to support this fallback behavior, I think Karpenter would have to have a set of Launch Templates which I suspect is too complicated? Note that Spot is supported because CreateFleet supports choosing the more optimized allocation strategy |
Fwiw, we maintain many launch templates under the hood per node pool. It's required for things like labels and architectures. That said, we model "preferential fallback" using the weight feature, so I don't think it's crazy to reuse that here. However, LT combinatorics don't seem like they'll be a design constraint on this question. |
Ya I realize I don't actually know that code that well and I was suspecting I was wrong after writing this. But I think I agree that it's likely better to push that logic to weights instead. I am trying to write up a brief design on implementing adding capacity reservation to EC2NodeClasses so I wanted to make sure that we want to keep LTs under nodeclass clean/ minimal for the sake for just launching the right node and not for prioriziation |
Tell us about your request
As a EKS platform engineer, I would like to reduce my platform cost and increase resiliency to capacity outages by provisioning AWS EC2 instances through the EC2 Fleet API by prioritizing Capacity Reservations. Capacity reservations are a mechanism for us to reserve capacity of particular instance types in particular AZs. This is done to reduce cost and increase resiliency to issues in acquiring capacity at large scale.
I would like to be able to specify in the, e.g.
AWSTemplate.spec.on-demand-options.CapacityReservations
and specifyuse-capacity-reservations-first
in order to make sure the instances that launch are placed in targeted capacity reservations in that account. Without this support, we could end up paying for reserved capacity and On-Demand instances through EC2 Fleet.Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
The problem is this workload is critical, and has a set SLA that we need to meet. Capacity reservations help us achieve tremendous vCPU counts without worrying about capacity, and currently we cannot use Karpenter's scalability and advantages at all.
Reserved instances and open capacity reservations help, but this puts the responsibility of the EKS platform engineer to always have to monitor and provide the AZs and empty reservation slots to the provisioner, which makes the application team less autonomous and adds complexity in managing the reservations. In addition, with targeted reservations opening the door for specific components of an application to take advantage of different reservation amounts in a single account, there is no current way in the provisioner to specify the reservation ID either.
Are you currently working around this issue?
There is no workaround; we don't have a mechanism to use Karpenter because it does not support specifying Capacity Reservations. Cluster Autoscaler also does not do this, so we have no way of guarranteeing the use of CRs with EKS currently on AWS unless we own the Ec2 fleet calls.
Additional Context
No response
Attachments
No response
Community Note
The text was updated successfully, but these errors were encountered: