Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proposal for EC2 instance selector integration and dry-run #3107

Merged
merged 5 commits into from
Mar 10, 2021

Conversation

cPu1
Copy link
Collaborator

@cPu1 cPu1 commented Jan 19, 2021

Description

Rendered

Closes #2998

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the userdocs directory)
  • Manually tested
  • Made sure the title of the PR is a good description that can go into the release notes
  • (Core team) Added labels for change area (e.g. area/nodegroup), target version (e.g. version/0.12.0) and kind (e.g. kind/improvement)

BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯

  • Backfilled missing tests for code in same general area 🎉
  • Refactored something and made the world a better place 🌟

Community Note:

  • Please let us know if you like this integration by adding a 👍 reaction to the PR description to help the maintainers understand the feature's impact on the community
  • Please do not leave "+1" or "me too" comments, they generate extra noise for PR followers and do not help move the discussion forward

@cPu1 cPu1 force-pushed the instance-selector-proposal branch from 645613c to ff58cbe Compare January 20, 2021 15:39
@aclevername
Copy link
Contributor

aclevername commented Jan 20, 2021

Lovely write up @cPu1.

Just to ensure I'm understanding:
When the user runs create cluster --dry-run and the user has provided:

managedNodeGroups:
- name: linux
  instanceSelector:
    vCPUs: 2
    memory: 4 

We compute instanceTypes using the instanceSelector values and present them, e.g.:

managedNodeGroups:
- name: linux
  instanceTypes: ["c3.large", "c4.large", "c5.large", "c5d.large", "c5n.large", "c5a.large"]
  instanceSelector:
    vCPUs: 2
    memory: 4 

Question:
Q1:
Why do we need to show them instanceTypes? Is the intention that the users will see the list of instanceTypes, go to there config, remove the instanceSelector field and replace it with instanceTypes and set values that they like, e.g. lets say the user doesn't want any c5* instances, after seeing the above output from a --dry-run they would modify the config to be:

managedNodeGroups:
- name: linux
  instanceTypes: ["c3.large", "c4.large"]

If this is the case then why do we add the complexity of this being part of the create cluster flow? The user should either:

  1. just set the instanceSelector fields and be happy with the decisions made by the EC2 Instance Selector
  2. Provide a list of instanceTypes themselves manually, we could make there lives easier by having an eksctl generate instance-types command added

I don't quite see the value in having this as part of create cluster/nodegroup

Q2:
Why would a user want to remove instanceTypes from the list? If this to try and reduce costs by removing bigger machine footprints could we not just introduce max/min-memory/vCPUs fields instead (EC2 Instance Selector has flags for these), e.g.:

managedNodeGroups:
- name: linux
  instanceSelector:
    minVCPUs: 2
    maxVCPUs: 4
    minMemory: 2
    maxMemory: 4

@Callisto13
Copy link
Contributor

Really thorough @cPu1 🎉

Furthering @aclevername's comment:

  1. Provide a list of instanceTypes themselves manually, we could move there lives easier by having an eksctl generate instance-types command added

In this case, we would be better off not providing anything they can just run the instance selector themselves.

IIUC, the value to adding this to eksctl is so that users do not need to think about it. If they want to disagree with the Selector, it is probably the exact same effort to just run the selector themselves than to do a 2 step process through us.
That said we do still need to return the config to them anyway, since their record will no longer be current.

@cPu1
Copy link
Collaborator Author

cPu1 commented Jan 21, 2021

  1. just set the instanceSelector fields and be happy with the decisions made by the EC2 Instance Selector

Users can do this by omitting the --dry-run flag. This is specified in the proposal:

If --dry-run is omitted, eksctl will proceed to creating the cluster and nodegroup with the instance types matching the instance selector resource criteria.

  1. Provide a list of instanceTypes themselves manually, we could make their lives easier by having an eksctl generate instance-types command added

Such a command would add little value because it'd be a mere wrapper around the ec2-instance-selector tool, which users can run themselves. This proposal is more about integrating the EC2 Instance Selector with the create cluster and create nodegroup workflows.

@aclevername
Copy link
Contributor

aclevername commented Jan 21, 2021

  1. just set the instanceSelector fields and be happy with the decisions made by the EC2 Instance Selector

Users can do this by omitting the --dry-run flag. This is specified in the proposal:

If --dry-run is omitted, eksctl will proceed to creating the cluster and nodegroup with the instance types matching the instance selector resource criteria.

👍

  1. Provide a list of instanceTypes themselves manually, we could make their lives easier by having an eksctl generate instance-types command added

Such a command would add little value because it'd be a mere wrapper around the ec2-instance-selector tool, which users can run themselves. This proposal is more about integrating the EC2 Instance Selector with the create cluster and create nodegroup workflows.

I don't understand the value it being printed out create cluster/nodegroup workflow. Having the feature is great, but if they don't like the values that get generated they could just run the binary themselves and provide the list themselves (this is why I suggested having an eksctl subcommand to wrap it to prevent them having to download it themselves)

@cPu1
Copy link
Collaborator Author

cPu1 commented Feb 9, 2021

I don't understand the value it being printed out create cluster/nodegroup workflow. Having the feature is great, but if they don't like the values that get generated they could just run the binary themselves and provide the list themselves (this is why I suggested having an eksctl subcommand to wrap it to prevent them having to download it themselves)

The proposed --dry-run feature is a general-purpose option that is not specifically tied to ec2-instance-selector. The default behaviour of eksctl create cluster is to create a cluster with a 2-node nodegroup that uses m5.large instances. The dry-run feature lets users generate a base config that can be used as a starting point for making modifications. The purpose of dry-run is to output a ClusterConfig representing the CLI options passed. When this feature is used with the instance selector options, for consistency, it also outputs the instance types matching the instance selector criteria.

@aclevername
Copy link
Contributor

I don't understand the value it being printed out create cluster/nodegroup workflow. Having the feature is great, but if they don't like the values that get generated they could just run the binary themselves and provide the list themselves (this is why I suggested having an eksctl subcommand to wrap it to prevent them having to download it themselves)

The proposed --dry-run feature is a general-purpose option that is not specifically tied to ec2-instance-selector. The default behaviour of eksctl create cluster is to create a cluster with a 2-node nodegroup that uses m5.large instances. The dry-run feature lets users generate a base config that can be used as a starting point for making modifications. The purpose of dry-run is to output a ClusterConfig representing the CLI options passed. When this feature is used with the instance selector options, for consistency, it also outputs the instance types matching the instance selector criteria.

I think as a feature aside from ec2-instance-selector it sounds like it has value, and might fit in well with #2774, in particular it might have some overlap with generating manifests for existing imperatively created clusters. However just focusing on the EC2 selector support for the moment I still think that the user should either:

  1. Just set the instanceSelector fields and be happy with the decisions made by the EC2 Instance Selector
  2. Provide a list of instanceTypes themselves manually

@aclevername aclevername requested a review from a team February 17, 2021 13:25
Copy link

@tabern tabern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. Some comments on the overview for this feature as well as questions around how we represent. No major blockers in the current design.


## Motivation

With support for multiple instance types and Spot instances for EKS Managed Nodegroups, users can create Spot instances where EKS will configure and launch an Autoscaling Group of Spot instances following Spot best practices. Users can set `managedNodeGroup.spot` and supply a list of instance types in `managedNodeGroup.instanceTypes` to instruct eksctl to create a managed nodegroup using Spot instances.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very focused on spot. Can we change the motivation to focus on the general case (that of course includes spot) so that users are not confused?


With support for multiple instance types and Spot instances for EKS Managed Nodegroups, users can create Spot instances where EKS will configure and launch an Autoscaling Group of Spot instances following Spot best practices. Users can set `managedNodeGroup.spot` and supply a list of instance types in `managedNodeGroup.instanceTypes` to instruct eksctl to create a managed nodegroup using Spot instances.

It is a Spot best practice to use multiple instance types, but with over 270 EC2 instance types, users have to spend time figuring out which instance types would be well suited for their nodegroup.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same call out here - this is an issue in general, not just for those using Spot.

The [EC2 Instance Selector](https://github.com/aws/amazon-ec2-instance-selector) tool addresses this problem by generating a list of instance types based on resource criteria such as vCPUs, memory, etc. This proposal aims to integrate the EC2 instance selector with eksctl to enable creating nodegroups with multiple instance types by passing the resource criteria.

### Problem: a simple integration
We could add the instance selector options—as CLI options (`--instance-selector-vcpus` and `--instance-selector-memory`) and in the config file—to `create cluster` and `create nodegroup` that would select the instance types matching the resource criteria for the user and proceed to creating the nodegroup. However, this gives users no control over selecting the instance types. Users may want to inspect and change the instances matching the instance selector criteria before proceeding to creating a nodegroup, and also to document the instance types being used for their nodegroups in the ClusterConfig file.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a strawman against a certain option. Can we be more clear about the problem we want to solve here?

In particular, is it critical that we have a "human in the loop" here? We could always modify the config file with the selections and output them. Is there benefit to having the manual selection process? Doesn't this require the user to have an understanding of the instance type options anyways?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading further - it looks like this is actually the proposal - but we negate it early to introduce the --dry-run functionality. Let's consider cutting or revising this section as its confusing.


## Proposal

This design proposes adding a new general-purpose `--dry-run` option to `eksctl create cluster` and `eksctl create nodegroup` that skips cluster and nodegroup creation and instead outputs a ClusterConfig file representing the CLI options passed. This functionality extends to the instance selector integration where eksctl will set the instance types for a nodegroup when the instance selector options are supplied. This gives users the opportunity to inspect and, if required, change the instance types before proceeding with cluster and nodegroup creation. Users can omit the `--dry-run` option if they do not care about the instance types selected by eksctl.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--dry-run is one aspect of the feature. Shouldn't we lead with the new --instance-selector-[ ] functionality?

Along with the `--dry-run` option, two new options `--instance-selector-vcpus` and `--instance-selector-memory` will be added to `eksctl create cluster` and `eksctl create nodegroup`. These options will also be supported in the ClusterConfig file for both managed and self-managed nodegroups with the following schema:

```yaml
instanceSelector:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are vCPU and memory the only inputs we will accept?

Copy link
Collaborator Author

@cPu1 cPu1 Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd start off by adding vCPU and memory, and add more options based on user feedback. Otherwise we'd have to add a CLI option instance-selector-* for each instance selector option. Are there other options that you think we should add in the initial implementation?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU would be an important one

is --instance-selector too verbose?


### eksctl create cluster

When `eksctl create cluster` is called with the instance selector options and `--dry-run`, eksctl will output a ClusterConfig file containing a nodegroup representing the CLI options and the instance types set to the instances matched by the instance selector resource criteria.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should ensure --dry-run outputs all options, not just node groups!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update looks good!

- name: <name>
instancesDistribution:
instanceTypes: ["c5.large", "c5a.large", "c5ad.large", "c5d.large", "t2.medium", "t3.medium", "t3a.medium"]
instanceSelector:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we automatically add a comment in the file "Ignored during execution" or something of the like?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. We can have something like:

  # Ignored during execution since `instanceTypes` is also set.
  instanceSelector:


managedNodeGroups:
- name: linux
instanceSelector:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it confusing that this is only in the MNG section? How do we handle merging MNG and non-MNG groups?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. This field is also present in the nodeGroups section below: https://github.com/weaveworks/eksctl/pull/3107/files#diff-af7ec3d70c271157e75ad67f9bb60c75c8596ace0bee45c3b6a83d7debd9408cR126

How do we handle merging MNG and non-MNG groups?

Can you expand on this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you answered my question -must had missed it on the first pass.

- name: <name>
spot: true
instanceTypes: ["c3.large", "c4.large", "c5.large", "c5d.large", "c5n.large", "c5a.large"]
instanceSelector:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@aclevername
Copy link
Contributor

I think as a feature aside from ec2-instance-selector it sounds like it has value, and might fit in well with #2774, in particular it might have some overlap with generating manifests for existing imperatively created clusters. However just focusing on the EC2 selector support for the moment I still think that the user should either:

  1. Just set the instanceSelector fields and be happy with the decisions made by the EC2 Instance Selector
  2. Provide a list of instanceTypes themselves manually

@cPu1 I'm still slightly confused as to why we need the --dry-run functionality for this feature, I still stand by what I said above. Am I missing a different reason as to why we need it for this feature?

@cPu1 cPu1 changed the title Add proposal for EC2 instance selector integration Add proposal for EC2 instance selector integration and dry-run Mar 1, 2021
@cPu1 cPu1 force-pushed the instance-selector-proposal branch from ff58cbe to 9e9a888 Compare March 2, 2021 14:47
Copy link

@tabern tabern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update LGTM 👍

We may want to add GPU as an instance selector option.
Also is --instance-select-x too verbose? May not be but wanted to see if you had thoughts on other options.

While the instance selector feature helps select a set of instances matching the resource criteria, some users may want to inspect and change the instances selected by the instance selector before proceeding to creating a nodegroup, and also to document the instance types being used for their nodegroups in the ClusterConfig file. For this reason, this design also proposes adding a new general-purpose `--dry-run` option to `eksctl create cluster` and `eksctl create nodegroup` that skips cluster and nodegroup creation and instead outputs a ClusterConfig file containing the default values set by eksctl and representing the supplied CLI options. This functionality extends to the instance selector integration where eksctl will set the instance types for a nodegroup when the instance selector options are supplied. This gives users the opportunity to inspect and, if required, change the instance types before proceeding with cluster and nodegroup creation. Users can omit the `--dry-run` option if they do not care about the instance types selected by eksctl.

There are certain one-off options that cannot be represented in the ClusterConfig file, e.g., `--install-vpc-controllers`.
Users would expect `eksctl create cluster --<options...> --dry-run > config.yaml` followed by `eksctl create cluster -f config.yaml` to be equivalent to running the first command without `--dry-run`. eksctl would therefore disallow passing options that cannot be represented in the config file when `--dry-run` is passed.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Along with the `--dry-run` option, two new options `--instance-selector-vcpus` and `--instance-selector-memory` will be added to `eksctl create cluster` and `eksctl create nodegroup`. These options will also be supported in the ClusterConfig file for both managed and self-managed nodegroups with the following schema:

```yaml
instanceSelector:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU would be an important one

is --instance-selector too verbose?


### eksctl create cluster

When `eksctl create cluster` is called with the instance selector options and `--dry-run`, eksctl will output a ClusterConfig file containing a nodegroup representing the CLI options and the instance types set to the instances matched by the instance selector resource criteria.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update looks good!

@cPu1
Copy link
Collaborator Author

cPu1 commented Mar 2, 2021

We may want to add GPU as an instance selector option.

Noted, I'll add it to the proposal. To be clear, the gpus option from instance selector that specifies the total number of GPUs, right?

Also is --instance-select-x too verbose? May not be but wanted to see if you had thoughts on other options.

It may be too verbose, but the instance selector option names by themselves are not very clear in the context of eksctl, e.g., just --vCPUs doesn't imply the use of instance selector. We need some prefix that indicates it's an instance selector option. Having the --instance-selector-* prefix also chimes in with the instanceSelector ClusterConfig field. It also helps users find all available instance selector options by typing --instance-selector and pressing Tab.

@cPu1 cPu1 force-pushed the instance-selector-proposal branch from ba29124 to ead28cd Compare March 3, 2021 12:26
Comment on lines 8 to 9
- [Proposal](#proposal)
- [Dry Run](#dry-run)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be in two separate proposal files? It feels a bit hard to distinguish them atm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could be, but I'll have to reword some of the bits because right now it's written within the context of instance selector. I was planning to rename the title and filename to include dry-run, so I'll do that.

@cPu1 cPu1 force-pushed the instance-selector-proposal branch from aa60887 to f5bd1db Compare March 10, 2021 13:05
@cPu1 cPu1 merged commit 122dc49 into eksctl-io:main Mar 10, 2021
@cPu1 cPu1 deleted the instance-selector-proposal branch March 10, 2021 14:05
@cPu1 cPu1 restored the instance-selector-proposal branch March 10, 2021 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EC2 Instance Selector Integration
4 participants