Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminate orphaned instances from CreateFleet requests #194

Merged
merged 2 commits into from
Oct 9, 2020

Conversation

haugenj
Copy link
Contributor

@haugenj haugenj commented Oct 2, 2020

When using CreateFleet to provision new instances, there's a chance that instances are returned from the call but then not attached to the ASG. Instances that are not attached to the ASG will never be terminated by Esclator, so they'll stay running in the user's account indefinitely. This happens in two cases:

  1. When not all instances are in a "ready" state before the instance ready timeout is hit. In this case the full number of instances become orphaned
  2. When a call to AttachInstances fails. In this case any instances that haven't been attached to the instance become orphaned (happens in batches of 20)

Here's a screenshot of this happening to me for this scenario:

  • 1 instance was requested but failed to be ready before the timeout
  • next loop 1 instance was requested and was ready in time and attached to the ASG
  • 5 instances were requested but at least one wasn't ready in time
  • 5 instances were requested and successfully attached to the ASG
  • When my job was finished half of the instances did not terminate

image

This change handles these cases and calls TerminateInstances with the list of instances that haven't been attached to the ASG. To try and mitigate the issue of these failures happening indefinitely, say if the timeout value is too low, a fatal is logged if this happens three times in a row.

Copy link
Member

@Jacobious52 Jacobious52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @haugenj

Thanks for your PR, great to see some more contribution to the Fleet support!

Just a few changes and discussion points from me, sorry for the comment spam!

pkg/cloudprovider/aws/aws.go Outdated Show resolved Hide resolved
pkg/cloudprovider/aws/aws.go Outdated Show resolved Hide resolved
pkg/cloudprovider/aws/aws.go Outdated Show resolved Hide resolved
pkg/cloudprovider/aws/aws.go Show resolved Hide resolved
pkg/cloudprovider/aws/aws_test.go Show resolved Hide resolved
pkg/cloudprovider/aws/aws.go Outdated Show resolved Hide resolved
@Jacobious52 Jacobious52 added enhancement New feature or request bug Something isn't working labels Oct 6, 2020
Copy link
Member

@Jacobious52 Jacobious52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work on the new tests!

@awprice awprice merged commit cc37552 into atlassian:master Oct 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants