Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: RFC Implementation Supporting ODCR #6198

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tvonhacht-apple
Copy link
Contributor

@tvonhacht-apple tvonhacht-apple commented May 14, 2024

This is a collaboration on implementing the RFC #5716 Supporting ODCRs

Progress

TODOs

  • [] Agree on instanceMatchCriteria vs type
  • [] Remove all log statements

Description

Supporting associating ODCR to EC2NodeClass

Add a new field capacityReservationSelectorTerms to EC2NodeClass

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: example-node-class
spec:
  capacityReservationSelectorTerms:
    - # The Availability Zone of the Capacity Reservation
      availabilityZone: String | None
      # The platform of operating system for which the Capacity Reservation reserves capacity
      id: String | None
      # The type of operating system for which the Capacity Reservation reserves capacity
      instanceType: String | None
      # The ID of the Amazon Web Services account that owns the Capacity Reservation
      ownerId: String | None
      # Tags is a map of key/value tags used to select subnets
      # Specifying '*' for a value selects all values for a given tag key.
      tags: Map | None
      # Indicates the type of instance launches that the Capacity Reservation accepts. The options include:
      #    - open:
      #       The Capacity Reservation accepts all instances that have
      #       matching attributes (instance type, platform, and Availability
      #       Zone). Instances that have matching attributes launch into the
      #       Capacity Reservation automatically without specifying any
      #       additional parameters.
      #    - targeted:
      #       The Capacity Reservation only accepts instances that
      #       have matching attributes (instance type, platform, and
      #       Availability Zone), and explicitly target the Capacity
      #       Reservation. This ensures that only permitted instances can use
      #             the reserved capacity.
      type: String | None
status:
  capacityReservations:
    - # AvailabilityZone of the Capacity Reservation
      availabilityZone: String
      # Available Instance Count of the Capacity Reservation
      availableInstanceCount: Integer
      # The date and time at which the Capacity Reservation expires. When a Capacity
      # Reservation expires, the reserved capacity is released and you can no longer
      # launch instances into it. The Capacity Reservation's state changes to expired
      # when it reaches its end date and time.
      endDate: String | None
      # Indicates the way in which the Capacity Reservation ends. A Capacity Reservation
      # can have one of the following end types:
      #   * unlimited - The Capacity Reservation remains active until you explicitly
      #   cancel it.
      #   * limited - The Capacity Reservation expires automatically at a specified
      #   date and time.
      endDateType: String
      # ID of the Capacity Reservation
      id: String
      # Indicates the type of instance launches that the Capacity Reservation accepts. The options include:
      #   - open:
      #       The Capacity Reservation accepts all instances that have
      #       matching attributes (instance type, platform, and Availability
      #       Zone). Instances that have matching attributes launch into the
      #       Capacity Reservation automatically without specifying any
      #       additional parameters.
      #   - targeted:
      #       The Capacity Reservation only accepts instances that
      #       have matching attributes (instance type, platform, and
      #       Availability Zone), and explicitly target the Capacity
      #       Reservation. This ensures that only permitted instances can use
      #       the reserved capacity.
      instanceMatchCriteria: String
      # Instance Platform of the Capacity Reservation
      instancePlatform: String
      # Instance Type of the Capacity Reservation
      instanceType: String
      # Owner Id of the Capacity Reservation
      ownerId: String
      # The date and time at which the Capacity Reservation was started.
      startDate: String
      # Total Instance Count of the Capacity Reservation
      totalInstanceCount: Integer

Adding new label karpenter.k8s.aws/capacity-reservation-id nodeClaim/node

In order for Karpenter (and admins) to understand if a NodeClaim/Node is part of an ODCR, Karpenter is adding the annotation karpenter.k8s.aws/capacity-reservation-id containing the Capacity Reservation Id (for example cr-12345678)

This follows closely (does not implement all fields) how EC2 DescribeCapacityReservations can filter.

  • only exception instanceMatchCriteria is called type

Karpenter will perform validation against the spec to ensure there isn't any violation prior to creating the LaunchTemplates.

How was this change tested?

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@tvonhacht-apple tvonhacht-apple requested a review from a team as a code owner May 14, 2024 05:57
Copy link

netlify bot commented May 14, 2024

Deploy Preview for karpenter-docs-prod canceled.

Name Link
🔨 Latest commit 1089159
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/6685b8b2be908a0008f3a0b2

@@ -154,12 +157,63 @@ func (r Resolver) Resolve(ctx context.Context, nodeClass *v1beta1.EC2NodeClass,
maxPods: int(instanceType.Capacity.Pods().Value()),
}
})

zones := scheduling.NewNodeSelectorRequirementsWithMinValues(nodeClaim.Spec.Requirements...).Get(v1.LabelTopologyZone)
capacityReservations := []v1beta1.CapacityReservation{}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can handle this within the if below, we dont need to create this variable beforehand

@tvonhacht-apple tvonhacht-apple force-pushed the feature/odcr branch 2 times, most recently from 96951d1 to 5471bc4 Compare May 21, 2024 05:49
@@ -96,6 +96,9 @@ func (c *CloudProvider) Create(ctx context.Context, nodeClaim *corev1beta1.NodeC
}
instance, err := c.instanceProvider.Create(ctx, nodeClass, nodeClaim, instanceTypes)
if err != nil {
if cloudprovider.IsInsufficientCapacityError(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we already get an ICE error back, we shouldn't have to wrap the error again, when we do a check down the line, it should be able to identify that the error is an ICE error, so long as one of the wrapped errors is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my testing that wasn't working, as it becomes a normal error see line 102 where its just a string at that point


zones := scheduling.NewNodeSelectorRequirementsWithMinValues(nodeClaim.Spec.Requirements...).Get(v1.LabelTopologyZone)
capacityReservations := []v1beta1.CapacityReservation{}
if capacityType == "capacity-reservation" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we select a capacity reservation NodeClaim, should we just best effort the NodeClaim launch here and then have it get deleted with an ICE error from Fleet if there isn't any available capacity. The next iteration of GetInstanceTypes should have the updated capacity reservation availability so we shouldn't try and launch with the same offering again on the second attempt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the benefit of this is that it becomes visible in nodeclaims that something is happening?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the benefit of this is that it becomes visible in nodeclaims that something is happening

More that we made the decision to launch it so we should probably follow it through. I get that you are trying to be a little smarter in that we know that the launch won't succeed most likely, but, in general, I'd like to avoid introducing too much modality into the code around specific logic like this.

I'm also thinking about this more and realizing that if we don't find an available capacity reservation for this launch earlier in our Create call (there's a filterInstanceTypes that should look for available offerings) then I'd expect us to return an ICE there. If we have already gotten past that point in code, it seems like that we are safe to proceed with the launch (though we may still need to see exactly which instance type offerings have availability with ODCR when we are creating the launch templates, but I would expect this to come through the GetInstanceTypes() and not be coming through the EC2NodeClass status)

zones := scheduling.NewNodeSelectorRequirementsWithMinValues(nodeClaim.Spec.Requirements...).Get(v1.LabelTopologyZone)
capacityReservations := []v1beta1.CapacityReservation{}
if capacityType == "capacity-reservation" {
for _, capacityReservation := range nodeClass.Status.CapacityReservations {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lo.Filter?

return nil, cloudprovider.NewInsufficientCapacityError(fmt.Errorf("trying to resolve capacity-reservation but no available capacity reservations available"))
}
}

for params, instanceTypes := range paramsToInstanceTypes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we group paramsToInstanceTypes above should we also group these with the capacity reservations in mind? This would allow us to keep the same logic on L178 where we iterate through the params and instance types, and I think this may work better as well since there should only be specific instance types that are valid for a given capacity reservation

@@ -71,6 +71,12 @@ cat << EOF > controller-policy.json
"Resource": "arn:${AWS_PARTITION}:eks:${AWS_REGION}:${AWS_ACCOUNT_ID}:cluster/${CLUSTER_NAME}",
"Sid": "EKSClusterEndpointLookup"
},
{
"Effect": "Allow",
"Action": "eks:DescribeCapacityReservations",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Action": "eks:DescribeCapacityReservations",
"Action": "ec2:DescribeCapacityReservations",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -48,6 +48,7 @@ var (
"UnfulfillableCapacity",
"Unsupported",
"InsufficientFreeAddressesInSubnet",
"ReservationCapacityExceeded",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we return this type of ICE error, should we short-circuit and update the capacity reservation that we launched with in-place so we don't have to wait for another iteration of the capacity reservation polling to update the instance availability.

If we didn't want to directly update this to 0, we could also use this as a trigger to re-call DescribeCapacityReservations since we know that something has changed since we made the launch decision originally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting it to zero at this time would be ok, then it eventually will refetch anyways, but this would let other evaluation happening at same time, get it immediately too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I guess it all sort-of comes out in the wash. I'm a little skeptical that we can model this with the existing ICE cache on its own. Minimally, we need to update the ICE cache to reflect the fact that the ICE is only for that particular capacity reservation. Without that, we'd either not try the OD instance in that AZ at all OR we wouldn't try other capacity reservations assigned to that instance type in that zone. Either way, we aren't really reacting correctly to what the CreateFleet call is telling us here.

pkg/apis/v1beta1/ec2nodeclass_status.go Show resolved Hide resolved
pkg/apis/v1beta1/ec2nodeclass_status.go Show resolved Hide resolved
pkg/apis/v1beta1/ec2nodeclass_status.go Show resolved Hide resolved
pkg/apis/v1beta1/ec2nodeclass.go Show resolved Hide resolved
@jonathan-innis jonathan-innis self-assigned this Jun 7, 2024
@tvonhacht-apple tvonhacht-apple force-pushed the feature/odcr branch 2 times, most recently from b7e2828 to 0f3ab47 Compare June 18, 2024 20:29
@tvonhacht-apple tvonhacht-apple force-pushed the feature/odcr branch 7 times, most recently from 140b2c3 to b8512cf Compare July 3, 2024 19:36

zones := scheduling.NewNodeSelectorRequirementsWithMinValues(nodeClaim.Spec.Requirements...).Get(v1.LabelTopologyZone)
capacityReservations := []v1beta1.CapacityReservation{}
if capacityType == "capacity-reservation" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the benefit of this is that it becomes visible in nodeclaims that something is happening

More that we made the decision to launch it so we should probably follow it through. I get that you are trying to be a little smarter in that we know that the launch won't succeed most likely, but, in general, I'd like to avoid introducing too much modality into the code around specific logic like this.

I'm also thinking about this more and realizing that if we don't find an available capacity reservation for this launch earlier in our Create call (there's a filterInstanceTypes that should look for available offerings) then I'd expect us to return an ICE there. If we have already gotten past that point in code, it seems like that we are safe to proceed with the launch (though we may still need to see exactly which instance type offerings have availability with ODCR when we are creating the launch templates, but I would expect this to come through the GetInstanceTypes() and not be coming through the EC2NodeClass status)

// Reservation. This ensures that only permitted instances can use
// the reserved capacity.
// +optional
Type string `json:"type,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an enum and we validate through kubebuilder OpenAPI that this is only one of the valid values. See https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/apis/v1beta1/nodepool.go#L76 for an example of this

// * limited - The Capacity Reservation expires automatically at a specified
// date and time.
// +required
EndDateType string `json:"endDateType"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be implied from the presence or lack of the end date time?

// launch instances into it. The Capacity Reservation's state changes to expired
// when it reaches its end date and time.
// +optional
EndDate *string `json:"endDate,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EndDate *string `json:"endDate,omitempty"`
EndDate *time.Time `json:"endDate,omitempty"`

Can this map to the actual go type?

// Reservation. This ensures that only permitted instances can use
// the reserved capacity.
// +required
InstanceMatchCriteria string `json:"instanceMatchCriteria"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're calling this type above, we should probably match that here

offering.Requirements.Add(scheduling.NewRequirement(v1beta1.LabelTopologyZoneID, v1.NodeSelectorOpIn, subnet.ZoneID))
}
// TODO: tvonhacht
offering.Requirements.Add(scheduling.NewRequirement(v1beta1.LabelCapactiyReservationID, v1.NodeSelectorOpIn, capacityReservation.ID))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we make sure that we don't create duplicate offerings for multiple ODCRs that have the same zone and instance type?

offering.Requirements.Add(scheduling.NewRequirement(v1beta1.LabelCapactiyReservationID, v1.NodeSelectorOpIn, capacityReservation.ID))
log.FromContext(ctx).WithValues("instanceType", *instanceType.InstanceType, "capacityReservation", capacityReservation.ID, "offering", offering).V(0).Info("offering for capacity reservation and instanceType")
offerings = append(offerings, offering)
instanceTypeOfferingAvailable.With(prometheus.Labels{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we punt on metrics for capacity reservations for instance type availability/price right now. This gets trickier since there's now no longer a concept of "global" availability and price but now availability is dictated on a per NodePool-basis, since the EC2NodeClass dictates the offerings that are surfaced for each one. Maybe we need to add a NodePool dimension to make this more accurate, but this is going to increase the cardinality even more than we already have (and this is a pretty high cardinality metric to begin with)

capacityTypeLabel: capacityType,
zoneLabel: zone,
}).Set(float64(lo.Ternary(available, 1, 0)))
instanceTypeOfferingPriceEstimate.With(prometheus.Labels{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here. Maybe worth punting on the metric since it's going to make things more challenging to reason about

@@ -64,6 +64,19 @@ var (
zoneLabel,
},
)
instanceTypeOfferingAvailableCapacityReservation = prometheus.NewGaugeVec(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we punt on this too. Ideally, we can just model this through our standard offering availabiliyt

@@ -58,43 +58,58 @@ type Provider interface {
DeleteAll(context.Context, *v1beta1.EC2NodeClass) error
InvalidateCache(context.Context, string, string)
ResolveClusterCIDR(context.Context) error
GetCapacityReservationID(launchTemplateName string) *string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if rather than relying on a launch template lookup if we can just know the map of instance type/az -> capacity reservation id during launch and then know that if a given instance type/az was chosen that it is a particular capacity reservation id. This might reduce cross-provider look-ups and additional dependencies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants