Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: deprecated AMIs should be discoverable when specifying ami id o… #6500

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

shabbskagalwala
Copy link

…n the amiSelectorTerms

Fixes #6393

Description
Deprecated AMI's should be discovered by Karpenter for existing node pool to avoid disruptions to workloads in production systems.

How was this change tested?

  1. make verify - to update the CRD definitions and apply to a development EKS cluster
  2. KO_DOCKER_REPO=4xxxxxxxxxx5.dkr.ecr.us-west-2.amazonaws.com/shabbskagalwala/karpenter-provider-aws/controller make image to build and push the image, which was consumed in EKS.
  3. Updating the ec2nodeclass to use a deprecated ami ami-002c17c43a2d082b5 which resulted in Karpenter successfully creating a nodeclaim with no errors as seen below
nodeclaim using deprecated ami
✗ k get nodeclaim cosmic-lamb-ljf5w -o yaml
apiVersion: karpenter.sh/v1beta1
kind: NodeClaim
metadata:
  annotations:
    karpenter.k8s.aws/ec2nodeclass-hash: "1736737552143516180"
    karpenter.k8s.aws/ec2nodeclass-hash-version: v2
    karpenter.k8s.aws/tagged: "true"
    karpenter.sh/nodepool-hash: "4656253658582728767"
    karpenter.sh/nodepool-hash-version: v2
    meta.helm.sh/release-name: cosmic-lamb
    meta.helm.sh/release-namespace: cosmic-lamb
  creationTimestamp: "2024-07-11T21:29:31Z"
  finalizers:
  - karpenter.sh/termination
  generateName: cosmic-lamb-
  generation: 1
  labels:
    ap.xyz.com/node-group: cosmic-lamb
    karpenter.k8s.aws/instance-category: m
    karpenter.k8s.aws/instance-cpu: "16"
    karpenter.k8s.aws/instance-cpu-manufacturer: aws
    karpenter.k8s.aws/instance-ebs-bandwidth: "4750"
    karpenter.k8s.aws/instance-encryption-in-transit-supported: "false"
    karpenter.k8s.aws/instance-family: m6g
    karpenter.k8s.aws/instance-generation: "6"
    karpenter.k8s.aws/instance-hypervisor: nitro
    karpenter.k8s.aws/instance-memory: "65536"
    karpenter.k8s.aws/instance-network-bandwidth: "5000"
    karpenter.k8s.aws/instance-size: 4xlarge
    karpenter.sh/capacity-type: on-demand
    karpenter.sh/nodepool: cosmic-lamb
    kubernetes.io/arch: arm64
    kubernetes.io/os: linux
    node.kubernetes.io/instance-type: m6g.4xlarge
    topology.k8s.aws/zone-id: usw2-az2
    topology.kubernetes.io/region: us-west-2
    topology.kubernetes.io/zone: us-west-2a
  name: cosmic-lamb-ljf5w
  ownerReferences:
  - apiVersion: karpenter.sh/v1beta1
    blockOwnerDeletion: true
    kind: NodePool
    name: cosmic-lamb
    uid: 2b45b290-f995-4146-ad4c-6a7af5948522
  resourceVersion: "397794878"
  uid: af6d4e08-d3a5-46d6-80cc-47a341487c4e
spec:
  nodeClassRef:
    apiVersion: karpenter.k8s.aws/v1beta1
    kind: EC2NodeClass
    name: cosmic-lamb
  requirements:
  - key: karpenter.sh/nodepool
    operator: In
    values:
    - cosmic-lamb
  - key: karpenter.k8s.aws/instance-size
    operator: In
    values:
    - 2xlarge
    - 4xlarge
    - 8xlarge
    - xlarge
  - key: node.kubernetes.io/instance-type
    operator: In
    values:
    - m6g.4xlarge
    - m6g.8xlarge
  - key: topology.kubernetes.io/zone
    operator: In
    values:
    - us-west-2a
  - key: kubernetes.io/arch
    operator: In
    values:
    - arm64
  - key: ap.xyz.com/node-group
    operator: In
    values:
    - cosmic-lamb
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - on-demand
  - key: karpenter.k8s.aws/instance-generation
    operator: Gt
    values:
    - "2"
  - key: karpenter.k8s.aws/instance-hypervisor
    operator: In
    values:
    - nitro
  - key: karpenter.k8s.aws/instance-family
    operator: In
    values:
    - m6g
  - key: kubernetes.io/os
    operator: In
    values:
    - linux
  resources:
    requests:
      cpu: 3390m
      ephemeral-storage: 68Gi
      memory: 18000Mi
      pods: "74"
status:
  allocatable:
    cpu: 15890m
    ephemeral-storage: 89Gi
    memory: 57632Mi
    pods: "234"
    vpc.amazonaws.com/pod-eni: "54"
  capacity:
    cpu: "16"
    ephemeral-storage: 100Gi
    memory: 60561Mi
    pods: "234"
    vpc.amazonaws.com/pod-eni: "54"
  conditions:
  - lastTransitionTime: "2024-07-11T21:30:14Z"
    message: RequirementsDrifted
    reason: RequirementsDrifted
    status: "True"
    type: Drifted
  - lastTransitionTime: "2024-07-11T21:30:30Z"
    message: ""
    reason: Initialized
    status: "True"
    type: Initialized
  - lastTransitionTime: "2024-07-11T21:29:33Z"
    message: ""
    reason: Launched
    status: "True"
    type: Launched
  - lastTransitionTime: "2024-07-11T21:30:30Z"
    message: ""
    reason: Ready
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-07-11T21:30:21Z"
    message: ""
    reason: Registered
    status: "True"
    type: Registered
  imageID: ami-0d20e6af81d7ce566
  nodeName: ip-172-xx-xxx-xxx.us-west-2.compute.internal
  providerID: aws:///us-west-2a/i-0xxxxxxxxxxxxxx1
deprecated ami as seen from aws cli
✗ aws ec2 describe-images --region us-west-2 --image-ids ami-0d20e6af81d7ce566
{
    "Images": [
        {
            "Architecture": "arm64",
            "CreationDate": "2024-03-31T07:04:33.000Z",
            "ImageId": "ami-0d20e6af81d7ce566",
            "ImageLocation": "0xxxxxxxxxx4/amazon-eks-arm64-node-1.27",
            "ImageType": "machine",
            "Public": false,
            "OwnerId": "0xxxxxxxxxx4",
            "PlatformDetails": "Linux/UNIX",
            "UsageOperation": "RunInstances",
            "State": "available",
            "BlockDeviceMappings": [
                {
                    "DeviceName": "/dev/xvda",
                    "Ebs": {
                        "DeleteOnTermination": true,
                        "Iops": 3000,
                        "SnapshotId": "snap-05784dd70fd8b6b0c",
                        "VolumeSize": 10,
                        "VolumeType": "gp3",
                        "Throughput": 125,
                        "Encrypted": true
                    }
                }
            ],
            "Description": "Kubernetes Worker AMI with AmazonLinux2 image",
            "EnaSupport": true,
            "Hypervisor": "xen",
            "Name": "amazon-eks-arm64-node-1.27",
            "RootDeviceName": "/dev/xvda",
            "RootDeviceType": "ebs",
            "SriovNetSupport": "simple",
            "Tags": [
              ...
              ...
              ...
              ...
              ...
            ],
            "VirtualizationType": "hvm",
            "BootMode": "uefi",
            "DeprecationTime": "2024-07-03T06:09:00.000Z"
        }
    ]
}

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@shabbskagalwala shabbskagalwala requested a review from a team as a code owner July 11, 2024 21:54
Copy link

netlify bot commented Jul 11, 2024

Deploy Preview for karpenter-docs-prod canceled.

Name Link
🔨 Latest commit c2d3e81
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/66c6d151fa269500085f570e

Copy link
Contributor

@njtran njtran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll need to edit the sorting logic here too

@shabbskagalwala shabbskagalwala force-pushed the feat/ami-deprecation branch 4 times, most recently from 76282c4 to cbb40af Compare July 15, 2024 17:48
sort.Slice(a, func(i, j int) bool {
ideptime := lo.TernaryF(
a[i].DeprecationTime != "",
func() time.Time { t, _ := time.Parse(time.RFC3339, a[i].DeprecationTime); return t },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fail loudly if this invariant is violated

Suggested change
func() time.Time { t, _ := time.Parse(time.RFC3339, a[i].DeprecationTime); return t },
func() time.Time { t := lo.Must(time.Parse(time.RFC3339, a[i].DeprecationTime)); return t },

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Updated :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ellistarn how is this different from the existing, where errors are ignored? Should that as well become lo.Must?

itime, _ := time.Parse(time.RFC3339, a[i].CreationDate)
jtime, _ := time.Parse(time.RFC3339, a[j].CreationDate)

Copy link
Author

@shabbskagalwala shabbskagalwala Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question - this is the same for the existing piece for CreationDate where we have the following lines - so i used the same logic for DeprecationTime

		itime, _ := time.Parse(time.RFC3339, a[i].CreationDate)
		jtime, _ := time.Parse(time.RFC3339, a[j].CreationDate)
		if itime.Unix() != jtime.Unix() {
			return itime.Unix() > jtime.Unix()
		}
		return a[i].AmiID < a[j].AmiID

@coveralls
Copy link

coveralls commented Jul 15, 2024

Pull Request Test Coverage Report for Build 10087006300

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 25 of 25 (100.0%) changed or added relevant lines in 2 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.03%) to 79.163%

Files with Coverage Reduction New Missed Lines %
pkg/providers/amifamily/ami.go 1 91.73%
Totals Coverage Status
Change from base Build 10085505927: 0.03%
Covered Lines: 5942
Relevant Lines: 7506

💛 - Coveralls

Comment on lines 89 to 90
itime, _ := time.Parse(time.RFC3339, a[i].CreationDate)
jtime, _ := time.Parse(time.RFC3339, a[j].CreationDate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that we want to do a Must here, too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of unrelated tests with the changes in PR fail when using Must here

Summarizing 6 Failures:
  [PANICKED!] NodeClass InstanceProfile Status Controller [It] should resolve the specified instance profile into the status when using instanceProfile field
  /Users/shabbirkagalwala/go/pkg/mod/github.com/samber/lo@v1.44.0/errors.go:53
  [PANICKED!] NodeClass InstanceProfile Status Controller [It] should not call the the IAM API when specifying an instance profile
  /Users/shabbirkagalwala/go/pkg/mod/github.com/samber/lo@v1.44.0/errors.go:53
  [PANICKED!] NodeClass InstanceProfile Status Controller [It] should not call CreateInstanceProfile or AddRoleToInstanceProfile when instance profile exists with correct role
  /Users/shabbirkagalwala/go/pkg/mod/github.com/samber/lo@v1.44.0/errors.go:53
  [PANICKED!] NodeClass InstanceProfile Status Controller [It] should update the role for the instance profile when the wrong role exists
  /Users/shabbirkagalwala/go/pkg/mod/github.com/samber/lo@v1.44.0/errors.go:53
  [PANICKED!] NodeClass InstanceProfile Status Controller [It] should create the instance profile when it doesn't exist
  /Users/shabbirkagalwala/go/pkg/mod/github.com/samber/lo@v1.44.0/errors.go:53
  [PANICKED!] NodeClass InstanceProfile Status Controller [It] should add the role to the instance profile when it exists without a role
  /Users/shabbirkagalwala/go/pkg/mod/github.com/samber/lo@v1.44.0/errors.go:53

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. We can address separately.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated based on conversation.

@shabbskagalwala shabbskagalwala force-pushed the feat/ami-deprecation branch 2 times, most recently from c61174d to 75f54a8 Compare July 15, 2024 23:07
Copy link
Contributor

@ellistarn ellistarn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
I'd like to get @engedaam's approval on this, too.

Copy link
Contributor

@engedaam engedaam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main comments are round observability and drift

MaxResults: aws.Int64(1000),
Filters: lo.Ternary(len(q.Filters) > 0, q.Filters, nil),
Owners: lo.Ternary(len(q.Owners) > 0, lo.ToSlicePtr(q.Owners), nil),
IncludeDeprecated: aws.Bool(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should notify users when Karpenter is launching instances with deprecated AMIs. IMO we should at least have warning logs. I think we should have a mechanism for users to be able to identify nodes that are deprecated. I'm not sure what that mechanism should be, but so option could be adding an annotation to the nodeclaim/node, eventing, nodeclaims status condition, metrics, or even repeatedly logging all nodes that are using deprecated AMIs. I personally like the idea of having an annotation to signal a Karpenter owned instances that is using a deprecated AMI. @ellistarn any opinions here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am with you that adding an annotation that an ami is deprecated is a good idea on the nodeclaim

Copy link
Contributor

@ellistarn ellistarn Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@engedaam Thoughts on a warning status condition on the NodeClass?
Logging/events strike me as spammy, and potentially ignored. Condition will cause a metric, which has nice benefits, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on a warning status condition on the NodeClass?

I like the idea of having the warning status condition on the NodeClass. Customers will need to do the leg work of finding the instances using the deprecated that are owned by the NodeClass themselves, but I don't think that would be a heavy lift for them.

Logging/events strike me as spammy, and potentially ignored. Condition will cause a metric, which has nice benefits, too.

I agree with that

Filters: lo.Ternary(len(q.Filters) > 0, q.Filters, nil),
Owners: lo.Ternary(len(q.Owners) > 0, lo.ToSlicePtr(q.Owners), nil),
IncludeDeprecated: aws.Bool(true),
MaxResults: aws.Int64(1000),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what is the expectation around drift? Currenlty, karpenter will use the resolved list of AMIs to check against the instances that were launched by Karpenter. If the AMI used on a node exist in the resolved list of AMIs the node will not be drifted. With this PR, if an AMI is deprecated and a new AMI is resolved by the same amiSelectorTerms, the existing nodes would not be drifted however new nodes launched will use the new non-deprecated AMI. What is thinking here? Should we drift nodes that are using deprecated AMIs if there is a new non-deprecated AMI is resolved?

Copy link
Author

@shabbskagalwala shabbskagalwala Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the idea was when following the pattern as described here that Karpenter should not disrupt workloads when an AMI is deprecated and amiSelectorTerms contain only an id. Karpenter does a real amazing job of detecting newer amis when launching nodes when amiSelectorTerms contains names or tags which support wildcards.

Should we drift nodes that are using deprecated AMIs if there is a new non-deprecated AMI is resolved?

In my opinion yes if a new non-deprecated AMI is available nodes should be drifted. I think that would warrant a change here

Copy link
Contributor

@ellistarn ellistarn Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC our previous conversations decided:

  1. We will migrate off of deprecated AMIs if we have multiple in the selector
  2. We will keep using deprecated AMIs if there is no non-deprecated alternative

e.g., we will only use a deprecated AMI if there is no other option, but we won't take an outage.

Copy link
Contributor

@engedaam engedaam Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, Karpenter will scale-up nodes based on the sorted AMIs, but not for drift. We only check if the AMI used on the instance is within the resolved list of AMIs for the nodeclass. We will need to update our drift detection to find nodes that are using deprecated AMIs that can be updated to drifted to use non-deprecated AMIs.

Karpenter does a real amazing job of detecting newer amis when launching nodes when amiSelectorTerms contains names or tags which support wildcards.

  1. We will migrate off of deprecated AMIs if we have multiple in the selector

In the current state of the PR, we will need to make updates here or we break that behavior all together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this decision might require broader review from the other maintainers.

Copy link
Author

@shabbskagalwala shabbskagalwala Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ellistarn @engedaam my apologies for the delayed response but I want to share my findings and make sure I am not misunderstanding the requirement and discussion here.

  1. IncludeDeprecated only takes effect when the AMI owner is not self (the current account) for example when a central AWS account is responsible for building and sharing AMIs to multiple accounts in the organization. By default deprecated AMIs in the same AWS account i.e the owner is self are visible always.
  2. I ran a few tests on my sandbox EKS cluster and the drift works as expected as long as a new non deprecated AMI is available and the amiSelectorTerms don't have the AMI id pinned. As an example here's what i tested with (as an example with nodepool of type of only 1 instance-family and arch )

I ran the same tests below with using tags and name on the amiSelectorTerms

spec:
  amiFamily: AL2
  amiSelectorTerms:
  - name: dummy-amazon-eks-node-1.29-*
    owner: "111111111111" 

The above is in AWS account 999999999999 and the AMI is being shared from AWS account 111111111111

  1. Next due to the selector terms we see that all nodes from this ec2nc are using the AMI
status:
  amis:
  - id: ami-053ccf7a5d1c364ac
    name: dummy-amazon-eks-node-1.29-test-20240625
    requirements:
    - key: kubernetes.io/arch
      operator: In
      values:
      - amd64

and the node claim

   conditions:
  - lastTransitionTime: "2024-07-25T04:18:14Z"
    message: ""
    reason: Initialized
    status: "True"
    type: Initialized
  - lastTransitionTime: "2024-07-25T04:17:12Z"
    message: ""
    reason: Launched
    status: "True"
    type: Launched
  - lastTransitionTime: "2024-07-25T04:18:14Z"
    message: ""
    reason: Ready
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-07-25T04:18:06Z"
    message: ""
    reason: Registered
    status: "True"
    type: Registered
  imageID: ami-053ccf7a5d1c364ac
  nodeName: ip-172-xx-xxx-xx.us-west-2.compute.internal
  providerID: aws:///us-west-2b/i-0xxxxxxxxxxxxxx82
  1. I then deprecated this AMI as visible below and scaled up pods in an namespace which scaled up and added new nodeclaims and worked as expected - otherwise without the changes in this PR Karpenter would fail to scale up new nodes
✗ aws ec2 describe-images --region us-west-2 --image-ids ami-053ccf7a5d1c364ac
{
    "Images": [
        {
            "Architecture": "x86_64",
            "CreationDate": "2024-07-25T04:07:58.000Z",
            "ImageId": "ami-053ccf7a5d1c364ac",
            "ImageLocation": "111111111111/dummy-amazon-eks-node-1.29-test-20240625",
            "ImageType": "machine",
            "Public": false,
            "OwnerId": "111111111111",
            "PlatformDetails": "Linux/UNIX",
            "UsageOperation": "RunInstances",
            "State": "available",
            ...
            ...
            ...
            ...
            "VirtualizationType": "hvm",
            "DeprecationTime": "2024-07-25T22:30:00.000Z",
            "ImdsSupport": "v2.0"
        }
    ]
}
  1. Then, I created a new AMI in AWS account 111111111111 and shared it with account 999999999999. Since the creationDate of this new AMI was more recent - I see the drift working as expected.

Here's the nodeclaims from the new AMI

  conditions:
  - lastTransitionTime: "2024-07-29T21:49:52Z"
    message: ""
    reason: Initialized
    status: "True"
    type: Initialized
  - lastTransitionTime: "2024-07-29T21:48:58Z"
    message: ""
    reason: Launched
    status: "True"
    type: Launched
  - lastTransitionTime: "2024-07-29T21:49:52Z"
    message: ""
    reason: Ready
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-07-29T21:49:44Z"
    message: ""
    reason: Registered
    status: "True"
    type: Registered
  imageID: ami-0ab3fd8c11a5baed9
  nodeName: ip-172-xx-xxx-xxx.us-west-2.compute.internal
  providerID: aws:///us-west-2c/i-0xxxxxxxxxxxxxxdc

and the ec2nc

status:
  amis:
  - id: ami-0ab3fd8c11a5baed9
    name: dummy-amazon-eks-node-1.29-test-20240703
    requirements:
    - key: kubernetes.io/arch
      operator: In
      values:
      - amd64

and the nodeclaims (notice the drift working where old nodes 4d17h of age were drifted and new nodes with the new AMI were spun up)

→ [04:50:17] (git:main *=) ✗ k get nodeclaim
NAME                  TYPE         ZONE         NODE                                           READY   AGE
cosmic-lamb-1-5kxj7   m5.xlarge    us-west-2c   ip-172-xx-xxx-xxx.us-west-2.compute.internal   True    3m27s
cosmic-lamb-1-cqjcm   m5.xlarge    us-west-2a   ip-172-xx-xxx-xxx.us-west-2.compute.internal   True    4d17h
cosmic-lamb-1-fccr4   m5.xlarge    us-west-2b   ip-172-xx-xxx-xxx.us-west-2.compute.internal   True    4m39s
→ [04:52:23] (git:main *=) ✗ k get nodeclaim
NAME                  TYPE         ZONE         NODE                                           READY   AGE
cosmic-lamb-1-5kxj7   m5.xlarge    us-west-2c   ip-172-xx-xxx-xxx.us-west-2.compute.internal   True    3m48s
cosmic-lamb-1-fccr4   m5.xlarge    us-west-2b   ip-172-xx-xxx-xxx.us-west-2.compute.internal   True    5m

and the new AMI

✗ aws ec2 describe-images --region us-west-2 --image-ids ami-0ab3fd8c11a5baed9 
{
    "Images": [
        {
            "Architecture": "x86_64",
            "CreationDate": "2024-07-29T21:16:39.000Z",
            "ImageId": "ami-0ab3fd8c11a5baed9",
            "ImageLocation": "111111111111/dummy-amazon-eks-node-1.29-test-20240703",
            "ImageType": "machine",
            "Public": false,
            "OwnerId": "111111111111",
            "PlatformDetails": "Linux/UNIX",
            "UsageOperation": "RunInstances",
            "State": "available",
            ...
            ...
            ...
            ...
            "EnaSupport": true,
            "Hypervisor": "xen",
            "Name": "dummy-amazon-eks-node-1.29-test-20240703",
            "RootDeviceName": "/dev/xvda",
            "RootDeviceType": "ebs",
            "SriovNetSupport": "simple",
            "VirtualizationType": "hvm",
            "ImdsSupport": "v2.0"
        }
    ]
}

The only scenario where drift doesn't work is when an AMI is deprecated and no new AMIs are available, Karpenter will not fall back to a non deprecated AMI with an older creationDate from the deprecated AMI.

Do we need to fall back to an older non deprecated AMI if there are no new non deprecated AMIs available?

Copy link
Author

@shabbskagalwala shabbskagalwala Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/disable-an-ami.html

When an AMI is disabled - there's a separate flag for that, to make it discoverable IncludeDisabled

Additionally

A disabled AMI can't be shared. If an AMI was public or previously shared, it is made private. If an AMI was shared with an AWS account, organization, or Organizational Unit, they lose access to the disabled AMI.

and

EC2 instances that were previously launched using an AMI that is subsequently disabled are not affected, and can be stopped, started, and rebooted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ellistarn @engedaam @jonathan-innis awesome 🥳 karpenter v1 is released, i am excited to try it out soon. I just wanted to check with you all if we can reinitiate the conversation for this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@engedaam @jonathan-innis apologies for the tag but just wanted to bump this PR again and see if we can move forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: deprecated AMIs should be able to be listed when specifying ami id on the amiSelectorTerms
6 participants