Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ARM node groups #11629

Open
gurchik opened this issue Nov 15, 2023 · 9 comments
Open

Add support for ARM node groups #11629

gurchik opened this issue Nov 15, 2023 · 9 comments
Assignees
Labels
JIRA kind/enhancement QA/dev-automation Issues that engineers have written automation around so QA doesn't have look at this
Milestone

Comments

@gurchik
Copy link

gurchik commented Nov 15, 2023

Currently eks-operator only supports 3 types of node groups:

  • eks.AMITypesAl2X8664 - for x86_64 nodes (default case)
  • eks.AMITypesAl2X8664Gpu - for GPU x86_64 nodes (used when the gpu input is set to true)
  • eks.AMITypesCustom - used in all other cases, requires the user to create a custom launch template

https://github.com/rancher/eks-operator/blob/3e0b47a570397a23c82f8fb6ce8e8fee34582e0d/pkg/eks/create.go#L273-L281

It would be great if there was also support for eks.AMITypesAl2Arm64. This would allow the creation of Managed Node Groups that are running ARM. Currently the only way to target ARM is to create a custom launch template and specify it with the launch_template input. Creating a custom launch template is not preferable because, among other things, it requires creating a custom userdata script and you lose the ability to perform a managed node group update.

This constant is available in the version of aws-sdk-go being used: https://pkg.go.dev/github.com/aws/aws-sdk-go@v1.45.6/service/eks#pkg-constants

Desirable result would be to have an arm input I can set to true.

@kkaempf
Copy link

kkaempf commented Nov 15, 2023

@davidstauffer 👆🏻

@valaparthvi
Copy link

Tested via API on 2.9-head and it works fine.

"nodeGroups": [
    {
        "arm": true,
        "desiredSize": 2,
        "diskSize": 20,
        "ec2SshKey": null,
        "gpu": null,
        "imageId": null,
        "instanceType": "a1.medium",
        "labels": { },
        "launchTemplate": null,
        "maxSize": 3,
        "minSize": 1,
        "nodeRole": "arn:aws:iam::879933548321:role/auto-ekshostcluster-hp-cipze-node--NodeInstanceRole-YyTRKONmeTTW",
        "nodegroupName": "dng",
        "requestSpotInstances": false,
        "resourceTags": { },
        "spotInstanceTypes": null,
        "subnets": [ 2 items
            "subnet-0196175050481fd97",
            "subnet-06bbbb08fbb70e046"
        ],
        "tags": { },
        "type": "/v3/schemas/nodeGroup",
        "userData": null,
        "version": "1.28"
    }
]
  1. Add EKS with arm
  2. Add nodegroup
  3. Delete nodegroup
  4. Scale the nodegroup up/down; here I noticed some flakes, for e.g. For an existing nodegroup with InstanceType: a1.large, scaling up returned the following error: [Syncing error] error for cluster [c-vzmmp]: health error for node group [dng] in cluster [c-vzmmp]: Could not launch On-Demand Instances. Unsupported - Your requested instance type (a1.medium) is not supported in your requested Availability Zone (ap-south-1a). Please retry your request by not specifying an Availability Zone or choosing ap-south-1b, ap-south-1c. Launching EC2 instance failed.

Changes that UI will need:

  1. UI is missing a checkbox for "ARM".
  2. Instance types compatible with ARM are not visible in UI.

Image

Overall I think it works well.

@furkatgofurov7
Copy link

@kkaempf can we get help from UI for:

Changes that UI will need:

  1. UI is missing a checkbox for "ARM".
  2. Instance types compatible with ARM are not visible in UI.

@mjura
Copy link
Contributor

mjura commented Aug 13, 2024

I don's see UI ticket assigned to it, @valaparthvi do you know if UI issue was also created.

@valaparthvi
Copy link

There is no UI ticket open. Should I transfer this issue to rancher/dashboard or create a new one? @mjura

@mjura
Copy link
Contributor

mjura commented Aug 13, 2024

There is no UI ticket open. Should I transfer this issue to rancher/dashboard or create a new one? @mjura

Yes, please transfer this issue to rancher/dashboard

@kkaempf kkaempf transferred this issue from rancher/eks-operator Aug 13, 2024
@github-actions github-actions bot added the QA/dev-automation Issues that engineers have written automation around so QA doesn't have look at this label Aug 13, 2024
@kkaempf
Copy link

kkaempf commented Aug 13, 2024

@gaktive @nwmac - ptal 😉

@nwmac nwmac modified the milestones: v2.9.0, v2.10.0, v2.11.0 Aug 13, 2024
@nwmac
Copy link
Member

nwmac commented Aug 13, 2024

Let's not move tickets to the dashboard repo with a milestone that has passed - updating the milestone to 2.11.0 FYI @gaktive - will need to review - this probably needs a JIRA ticket as well.

@gaktive
Copy link
Member

gaktive commented Aug 14, 2024

Internal reference: SURE-8886

@gaktive gaktive added the JIRA label Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
JIRA kind/enhancement QA/dev-automation Issues that engineers have written automation around so QA doesn't have look at this
Projects
None yet
Development

No branches or pull requests

7 participants