Skip to content

Commit

Permalink
Update RFC with closed api decision
Browse files Browse the repository at this point in the history
  • Loading branch information
jonathan-innis committed Aug 20, 2024
1 parent c6839f8 commit c167a86
Showing 1 changed file with 73 additions and 101 deletions.
174 changes: 73 additions & 101 deletions designs/odcr.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,64 @@
# On demand capacity reservation
# On-Demand Capacity Reservations

This document proposes supporting ODCR in Karpenter

- [On-Demand Capacity Reservations](#on-demand-capacity-reservations)
* [Background](#background)
+ [Capacity Reservations](#capacity-reservations)
* [Goals](#goals)
* [Non-Goals](#non-goals)
* [Capacity Reservation Selection](#capacity-reservation-selection)
+ [EC2NodeClass API](#ec2nodeclass-api)
+ [NodePool API](#nodepool-api)
* [Scheduling Representation](#scheduling-representation)
+ [Consider ODCR first during Scheduling](#consider-odcr-first-during-scheduling)
+ [Adding ODCRs as Additional Instance Type Offerings](#adding-odcrs-as-additional-instance-type-offerings)
+ [Representing ODCR Available Instance Counts in Instance Type Offerings](#representing-odcr-available-instance-counts-in-instance-type-offerings)
* [Capacity Reservation Expiration](#capacity-reservation-expiration)
* [CloudProvider Launch Behavior](#cloudprovider-launch-behavior)
* [Pricing/Consolidation](#pricingconsolidation)
+ [Provisioning](#provisioning)
+ [Consolidation](#consolidation)
- [Consolidating into Capacity Reserved Instances](#consolidating-into-capacity-reserved-instances)
- [Consolidating between Capacity Reservations](#consolidating-between-capacity-reservations)
* [Drift](#drift)
+ [The NodePool selects on `karpenter.sh/capacity-type: reserved` but the instance is no longer in a reservation](#the-nodepool-selects-on-karpentershcapacity-type-reserved-but-the-instance-is-no-longer-in-a-reservation)
+ [The `capacityReservationSelectorTerms` no longer selects an instance's capacity reservation](#the-capacityreservationselectorterms-no-longer-selects-an-instances-capacity-reservation)
* [Launch Failures](#launch-failures)
* [Open Questions](#open-questions)
* [Appendix](#appendix)
+ [Input/Output for CreateFleet with CapacityReservations](#inputoutput-for-createfleet-with-capacityreservations)

# On-Demand Capacity Reservations

This document proposes supporting ODCR in Karpenter

- [On demand capacity reservation](#on-demand-capacity-reservation)
- [Background](#background)
- [Capacity Reservations](#capacity-reservations)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Capacity Reservation Selection](#capacity-reservation-selection)
- [EC2NodeClass API](#ec2nodeclass-api)
- [NodePool API](#nodepool-api)
- [Node Labels](#node-labels)
- [Scheduling Representation](#scheduling-representation)
- [Consider ODCR first during Scheduling](#consider-odcr-first-during-scheduling)
- [Adding ODCRs as Additional Instance Type Offerings](#adding-odcrs-as-additional-instance-type-offerings)
- [Representing ODCR Available Instance Counts in Instance Type Offerings](#representing-odcr-available-instance-counts-in-instance-type-offerings)
- [Capacity Reservation Expiration](#capacity-reservation-expiration)
- [CloudProvider Launch Behavior](#cloudprovider-launch-behavior)
- [Pricing/Consolidation](#pricingconsolidation)
- [Provisioning](#provisioning)
- [Consolidation](#consolidation)
- [Consolidating into Capacity Reserved Instances](#consolidating-into-capacity-reserved-instances)
- [Consolidating between Capacity Reservations](#consolidating-between-capacity-reservations)
- [Drift](#drift)
- [The NodePool selects on `karpenter.k8s.aws/capacity-reservation-id` with the `Exists` operator but the instance is no longer in a reservation](#the-nodepool-selects-on-karpenterk8sawscapacity-reservation-id-with-the-exists-operator-but-the-instance-is-no-longer-in-a-reservation)
- [The `capacityReservationSelectorTerms` no longer selects an instances `karpenter.k8s.aws/capacity-reservation-id` value](#the-capacityreservationselectorterms-no-longer-selects-an-instances-karpenterk8sawscapacity-reservation-id-value)
- [Launch Failures](#launch-failures)
- [Appendix](#appendix)
- [Input/Output for CreateFleet with CapacityReservations](#inputoutput-for-createfleet-with-capacityreservations)
* [Background](#background)
+ [Capacity Reservations](#capacity-reservations)
* [Goals](#goals)
* [Non-Goals](#non-goals)
* [Capacity Reservation Selection](#capacity-reservation-selection)
+ [EC2NodeClass API](#ec2nodeclass-api)
+ [NodePool API](#nodepool-api)
* [Node Labels](#node-labels)
* [Scheduling Representation](#scheduling-representation)
+ [Consider ODCR first during Scheduling](#consider-odcr-first-during-scheduling)
+ [Adding ODCRs as Additional Instance Type Offerings](#adding-odcrs-as-additional-instance-type-offerings)
+ [Representing ODCR Available Instance Counts in Instance Type Offerings](#representing-odcr-available-instance-counts-in-instance-type-offerings)
* [Capacity Reservation Expiration](#capacity-reservation-expiration)
* [CloudProvider Launch Behavior](#cloudprovider-launch-behavior)
* [Pricing/Consolidation](#pricingconsolidation)
+ [Provisioning](#provisioning)
+ [Consolidation](#consolidation)
- [Consolidating into Capacity Reserved Instances](#consolidating-into-capacity-reserved-instances)
- [Consolidating between Capacity Reservations](#consolidating-between-capacity-reservations)
* [Drift](#drift)
+ [The NodePool selects on `karpenter.sh/capacity-type: reserved` but the instance is no longer in a reservation](#the-nodepool-selects-on-karpentershcapacity-type-reserved-but-the-instance-is-no-longer-in-a-reservation)
+ [The `capacityReservationSelectorTerms` no longer selects an instance's capacity reservation](#the-capacityreservationselectorterms-no-longer-selects-an-instances-capacity-reservation)
* [Launch Failures](#launch-failures)
* [Appendix](#appendix)
+ [Input/Output for CreateFleet with CapacityReservations](#inputoutput-for-createfleet-with-capacityreservations)

## Background

Expand Down Expand Up @@ -143,10 +174,9 @@ This API follows closely with how [DescribeCapacityReservations](https://docs.aw
### NodePool API
The EC2NodeClass API allows selection on capacity reservations, which give additional options to the scheduler to choose from when launching instance types; however, it does not offer a mechanism to scope-down whether instances in a NodePool should only launch into an ODCR, fallback between a capacity reservation to on-demand if none is available, or fallback between a capacity reservation to spot and then finally to on-demand. There are effectively 3 options we could take to allow this kind of selection:
The EC2NodeClass API allows selection on capacity reservations, which give additional options to the scheduler to choose from when launching instance types; however, it does not offer a mechanism to scope-down whether instances in a NodePool should only launch into an ODCR, fallback between a capacity reservation to on-demand if none is available, or fallback between a capacity reservation to spot and then finally to on-demand.
1. Constrain a NodePool to only launch ODCR capacity when it references an EC2NodeClass that has `capacityReservationSelectorTerms`. Fallback to OD and to spot requires additional NodePools of lower weights
2. Allow a NodePool to launch all types of capacity when `capacityReservationSelectorTerms` are provided, but create an additional `karpenter.k8s.aws/capacity-type` called `reserved`. A cluster admin could then select to support only launching ODCR capacity and falling back between ODCR capacity to on-demand capacity respectively. _NOTE: This option requires any applications (pods) that are using node selection on `karpenter.k8s.aws/capacity-type: "on-demand"` to expand their selection to include `reservation`_
This RFC proposes the addition of a new `karpenter.sh/capacity-type` label value, called `reserved`. A cluster admin could then select to support only launching ODCR capacity and falling back between ODCR capacity to on-demand capacity respectively. _NOTE: This option requires any applications (pods) that are using node selection on `karpenter.sh/capacity-type: "on-demand"` to expand their selection to include `reserved` or to update it to perform a `NotIn` node affinity on `karpenter.sh/capacity-type: spot`_

a. Only launch ODCR instances
```yaml
Expand All @@ -156,7 +186,7 @@ The EC2NodeClass API allows selection on capacity reservations, which give addit
name: default
spec:
requirements:
- key: karpenter.k8s.aws/capacity-type
- key: karpenter.sh/capacity-type
operator: In
values: ["reserved"]
```
Expand All @@ -168,7 +198,7 @@ The EC2NodeClass API allows selection on capacity reservations, which give addit
name: default
spec:
requirements:
- key: karpenter.k8s.aws/capacity-type
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand", "reserved"]
```
Expand All @@ -182,69 +212,12 @@ The EC2NodeClass API allows selection on capacity reservations, which give addit
# No additional requirements needed, launch all capacity types by default
requirements: []
```
3. [Recommended] Allow a NodePool to launch all types of capacity when `capacityReservationSelectorTerms` are provided, but create an additional `karpenter.k8s.aws/capacity-reservation-id`. A cluster admin could then select to support only launching ODCR capacity and falling back between ODCR capacity to on-demand capacity respectively. Unlike Option 2, this does not require applications (pods) that are selecting on `karpenter.k8s.aws/capacity-type: on-demand` to change anything about their selection.

a. Only launch ODCR instances
```yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
requirements:
- key: karpenter.k8s.aws/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/capacity-reservation-id
operator: Exists
```
b. Launch ODCR instances with on-demand fallback
```yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
requirements:
- key: karpenter.k8s.aws/capacity-type
operator: In
values: ["on-demand"]
```
c. Launch ODCR instances with spot and on-demand fallback
```yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
# No additional requirements needed, launch all capacity types by default
requirements: []
```

We are recommending Option 3. Notably, using capacity reservations is not an application owner concern -- app owners only care that they get the capacity they need to run their applications, they don't care how that capacity is acquired. This option does not require application owners to intervene to begin leveraging capacity reservations and allows cluster admins to describe the constraints necessary to ensure capacity availability for harder-to-get instance types.

## Node Labels

When a node is launched against a CapacityReservation we will expose Capacity Reservation
information as labels `karpenter.k8s.aws/capacity-reservation-id`. This is helpful for users to identify those nodes that are being used
by a capacity reservation and the id of the reservation. Scenarios such as tracking how many nodes are under each capacity reservation and then
checking how close to limit of the reservation.

```yaml
Name: example-node
Labels: karpenter.k8s.aws/capacity-reservation-id=cr-12345
karpenter.sh/capacity-type=on-demand
```

`karpenter.k8s.aws/capacity-reservation-id` will be the capacity reservation the node launched from.

We will propagate this information via [instance](https://github.com/aws/karpenter-provider-aws/blob/main/pkg/providers/instance/types.go#L29) by extracting it from [DescribeInstance](https://github.com/aws/karpenter-provider-aws/blob/main/pkg/batcher/describeinstances.go#L48) [aws doc]([https://docs.aws.amazon.com/sdk-for-go/api/service/ec2/#EC2.DescribeInstances](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CapacityReservationSpecificationResponse.html)https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CapacityReservationSpecificationResponse.html).

## Scheduling Representation

Since ODCRs are a AWS-specific concept, there needs to be a mechanism to pass down these ODCR options down for the scheduler to reason about. Importantly, we need the scheduler to know to prioritize these ODCR options when a user has specified them in their EC2NodeClass. Further, we need the scheduler to be aware that it can't launch an unlimited amount of these instances into an ODCR.

_Note: Updates to the scheduling representation of our offerings, including changes to how the core scheduling behavior works are going to require extensive performance benchmarking to ensure that we do not significantly trade-off performance when we support users selecting on ODCRs._
_Note: Updates to the scheduling representation of our offerings, including changes to how the core scheduling behavior works are going to require extensive performance benchmarking to ensure that we do not significantly tradeoff performance when we support users selecting on ODCRs._

### Consider ODCR first during Scheduling

Expand All @@ -266,10 +239,7 @@ offerings:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/capacity-reservation-id
operator: In
values: ["cr-111111111", "cr-222222222"]
values: ["reserved"]
- key: topology.kubernetes.io/zone
operator: In
values: ["us-west-2a"]
Expand Down Expand Up @@ -322,10 +292,7 @@ offerings:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/capacity-reservation-id
operator: In
values: ["cr-111111111", "cr-222222222"]
values: ["reserved"]
- key: topology.kubernetes.io/zone
operator: In
values: ["us-west-2a"]
Expand Down Expand Up @@ -406,15 +373,15 @@ In practice, this means that if a user has two capacity reservation offerings av

Due to capacity reservation expiration or due to changes from the user in their capacity reservation selection in their EC2NodeClass, instances can drift from the NodePool capacity reservation specification. Below outlines two ways that this drift can occur and how it will be re-reconciled.

### The NodePool selects on `karpenter.k8s.aws/capacity-reservation-id` with the `Exists` operator but the instance is no longer in a reservation
### The NodePool selects on `karpenter.sh/capacity-type: reserved` but the instance is no longer in a reservation

Assuming we have a reconciler that we will build as part of this design proposal that will remove the `karpenter.k8s.aws/capacity-reservation-id` label when a node is no longer in a capacity reservation, Karpenter's dynamic requirement drift checking should cause drift reconciliation in this case.
Assuming we have a reconciler that we will build as part of this design proposal that will update the `karpenter.sh/capacity-type` label from `reserved` to `on-demand` when a node is no longer in a capacity reservation, Karpenter's dynamic requirement drift checking should cause drift reconciliation in this case.

In this case, since the NodePool is selecting on a label that does not exist on the NodeClaim, drift detection will recognize that the NodeClaim is invalid and will mark it as drifted to be replaced by another node.

### The `capacityReservationSelectorTerms` no longer selects an instances `karpenter.k8s.aws/capacity-reservation-id` value
### The `capacityReservationSelectorTerms` no longer selects an instance's capacity reservation

In this case, there is no existing mechanism in Karpenter that would catch this. Karpenter will need to implement an additional mechanism that validates that a node's `karpenter.k8s.aws/capacity-reservation-id` value falls within the valid set of reservations selected-on from the `capacityReservationSelectorTerms`. Specifically, it needs to validate that that id exists with the `capacityReservation` section of the EC2NodeClass status.
In this case, there is no existing mechanism in Karpenter that would catch this. Karpenter will need to implement an additional mechanism that validates that an instance's capacity reservation falls within the valid set of reservations selected-on from the `capacityReservationSelectorTerms`. Specifically, it needs to validate that that id exists with the `capacityReservation` section of the EC2NodeClass status.

## Launch Failures

Expand All @@ -423,6 +390,11 @@ The main failure scenario is when Capacity Reservation limit is hit and no new n
1. We filter inside Karpenter before calling CreateFleet API and throwing an InsufficientCapacityError causing a reevaluation, with then a retry recalculation of instances maybe falling back to regular on-demand
2. We call CreateFleet API in certain race conditions, resulting in an InsufficientCapacityError causing a reevaluation, with then a fallback to on-demand could be selected if Capacity Reservations not available

## Open Questions

1. How do we deal with selecting on an incredibly large array of capacity reservations that we would have to store in the NodeClass status? How do we ensure that this doesn't significantly bloat the output?
2. Do we introduce this feature with a `FEATURE_GATE=CapcityReservations`? This would give us the flexibility to change the implementation of the feature while it's still in alpha.

## Appendix

### Input/Output for CreateFleet with CapacityReservations
Expand Down

0 comments on commit c167a86

Please sign in to comment.