Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for EC2 Spot Best practices - Diversification of instances using ABS #1400

Closed
ruecarlo opened this issue Nov 9, 2021 · 4 comments · Fixed by #1556
Closed

Support for EC2 Spot Best practices - Diversification of instances using ABS #1400

ruecarlo opened this issue Nov 9, 2021 · 4 comments · Fixed by #1556
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@ruecarlo
Copy link

ruecarlo commented Nov 9, 2021

Currently the lambda in charge of scaling up nodes (including Spot configurations), uses the RunInstance API to create a Spot instance. Spot instances as spare capacity might be limited. Best approach when using Spot instance is to diversify across a set of instances that qualify for the workload and use one of the API that allows for that diversification.

The suggestion here is to change the RunInstance call and instead use the Drop In replacement API for EC2 Fleet in instant and the Spot Capacity-Optimized allocation strategy. EC2 Fleet allows diversification and still provides a synchronous API that adhere to Spot best practices providing the spot instance types that are selected to minimise the frequency of interruptions for the workload. More examples here.

Another thing for consideration in the implementation is to use the newly released attribute based instance selection

@npalm npalm added good first issue Good for newcomers help wanted Extra attention is needed labels Nov 9, 2021
@ScottGuymer ScottGuymer self-assigned this Dec 10, 2021
@npalm
Copy link
Member

npalm commented Dec 11, 2021

I did a short experiment, see gist. It creates the instances but caues 2 issues

  • it also tries to launch instances in the default vpc, which fails since the required security group is not available in the default VPC. Not figured out, how to avoid possible creation. The result of the createFleet contains 1 instance in a valid subnet. But also the error mentioned before.
  • Creating the SSM properties does not work, did not investigate at all yet

@npalm
Copy link
Member

npalm commented Dec 11, 2021

With replacing the runInstance API call by createFleet also the dynamic launch templates can be removed. Finding a right spot instances (or on demand) should be moved to this API call.

count = length(local.instance_types)

@npalm
Copy link
Member

npalm commented Dec 11, 2021

Currently we specify in the launch template via the option market_type whenever we create a spot or on-demand. Maybe we should move this logic also to the API call for creating a fleet.

@npalm npalm assigned npalm and unassigned ScottGuymer Dec 21, 2021
@npalm
Copy link
Member

npalm commented Dec 21, 2021

Have a local working POC ready, will implement asap. This will replace the loop for creating instances. And add a fall back to on-demand instances

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants