Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

:instance_exists waiter not waiting for existence #859

Closed
tyler-ball opened this issue Jul 2, 2015 · 10 comments
Closed

:instance_exists waiter not waiting for existence #859

tyler-ball opened this issue Jul 2, 2015 · 10 comments
Labels
feature-request A feature should be added or improved.

Comments

@tyler-ball
Copy link

I'm confused by some behavior I just saw. I'm using the SDK V2. In test kitchen I send a create_instance request and immediately execute a wait_until :instance_exists. After that I try to tag the instance.

I just got the following error during this - Aws::EC2::Errors::InvalidInstanceIDNotFound The instance ID 'i-b3dbea44' does not exist.

Am I using the wrong waiter? I would not expect to get a does not exist error immediately after using the instance_exists waiter.

@tyler-ball tyler-ball changed the title :wait_for_existence waiter not waiting for existence :instance_exists waiter not waiting for existence Jul 2, 2015
@trevorrowe
Copy link
Member

The :instance_exists waiter works by polling the Amazon EC2 DescribeInstances operation, waiting until it returns the instance in a response. Unfortunately this does not mean that the Amazon EC2 API is in a state ready for this resource to be tagged. Currently there is no API that the SDK can call to determine if the resource is taggable yet. My best suggestion would be to add in an artificial, short, delay after the waiter successfully returns to allow for the service to be eventually consistent.

@tyler-ball
Copy link
Author

Then this seems like an issue with the Amazon EC2 API. It is returning a message saying that 'yes, the instance does exist' but then fails when trying to update that existence. Perhaps there are multiple meanings of the word existence from the API? An Object with an ID has been created, but it doesn't have a state of existence that allows an update to it?

Whatever the answer is, the current state of this API is still very confusing.

@tyler-ball
Copy link
Author

As a follow up to this, also found this code block failing:

server = ec2.create_instance(instance_data)
ec2.client.wait_until(
  :instance_exists,
  :instance_ids => [server.id]
)

The wait_until is failing with the error Aws::Waiters::Errors::UnexpectedError stopped waiting due to an unexpected error: The instance ID 'i-314ab4c7' does not exist

But isn't that why I'm calling the waiter? I want it to poll until that instance exists?

@trevorrowe
Copy link
Member

The :instance_exists should ignore Aws::EC2::Errors::InvalidInstanceIDNotFound errors returned by the service. I verified this locally, shown below with some logging:

ec2.wait_until(:instance_exists, instance_ids:['i-12345678'])
[Aws::EC2::Client 400 0.794136 0 retries] describe_instances(instance_ids:["i-12345678"]) Aws::EC2::Errors::InvalidInstanceIDNotFound The instance ID 'i-12345678' does not exist
[Aws::EC2::Client 400 0.459063 0 retries] describe_instances(instance_ids:["i-12345678"]) Aws::EC2::Errors::InvalidInstanceIDNotFound The instance ID 'i-12345678' does not exist
[Aws::EC2::Client 400 0.370968 0 retries] describe_instances(instance_ids:["i-12345678"]) Aws::EC2::Errors::InvalidInstanceIDNotFound The instance ID 'i-12345678' does not exist

This makes me wonder if Amazon EC2 is sometimes returning a different error code. Can you rescue the Aws::Waiters::Errors::UnexpectedError, and then log the original error:

begin
  # call waiter here
rescue Aws::Waiters::Errors::UnexpectedError => unexpected
  puts unexpected.error.class.name
  puts unexpected.error.message
end

@tyler-ball
Copy link
Author

We are using version 2.1.11 (just for more information). I'll add the block you suggested and see if I can get some more information, but you can see test-kitchen/kitchen-ec2#184 and grep the page for InvalidInstanceIDNotFound to see that its still throwing that error.

After re-reading our two stack traces it looks like we have seen two different exceptions. First we saw Aws::Waiters::Errors::UnexpectedError with the message stopped waiting due to an unexpected error: The instance ID 'i-314ab4c7' does not exist

Then I switched from using ec2.client.wait_until to using instance.wait_until_exists - they both call the same AWS API though (describe_instances), correct?

Now we see the error Aws::EC2::Errors::InvalidInstanceIDNotFound with the message The instance ID 'i-c23dfb34' does not exist

@trevorrowe
Copy link
Member

The Aws::EC2::Instance#wait_until_exists method calls Aws::EC2::Client#wait_until(:instance_exists, ...) passing the ID of the current instance. They will both result in the same #describe_instances call on the client. These waiter methods should not be raising the errors your are describing:

Aws::EC2::Errors::InvalidInstanceIDNotFound

This error should automatically trigger another polling attempt. If the configured number of polling attempts lapses, then an Aws::Waiters::Errors::TooManyAttemptsError should be raised. This error is explicitly called out here in the waiter definition:

https://github.com/aws/aws-sdk-ruby/blob/master/aws-sdk-core/apis/ec2/2015-04-15/waiters-2.json#L14-L18

Aws::Waiters::Errors::UnexpectedError

This error is reserved for service responses that contain an error which is not defined in the waiter definition. An error can trigger a waiter success, fail, or retry state. Any unexpected errors are wrapped in an instance of Aws::Waiters::Errors::UnexpectedError and then raised. You can access the original error by calling #error on the instance of Aws::Waiters::Errors::UnexpectedError.

How it should work

Based on the expected behavior, your should not be getter either of these errors, and it should continue polling. Additionally, I am unable to reproduce the issue myself. See the following examples:

This example makes up a fake instance ID and attempts to poll. This one polls the configured number of attempts and then gives up:

ec2 = Aws::EC2::Resource.new
instance = ec2.instance('i-12345678')
instance.wait_until_exists do |waiter|
  waiter.max_attempts = 2
  waiter.delay = 1
end
# raises Aws::Waiters::Errors::TooManyAttemptsError after 2 failed attempts

This one starts a new instance and polls it straightway:

ec2 = Aws::EC2::Resource.new
instances = ec2.create_instances({
  image_id: 'ami-1ccae774',
  instance_type: 'm1.large',
  min_count: 1,
  max_count: 1
})
instance = instances.first
instance.wait_until_exists
instance.terminate

@tyler-ball
Copy link
Author

I figured out why I was seeing UnexpectedError - the :instance_exists waiter automatically retries InvalidInstanceIDNotFound errors while the :instance_running waiter does not. I was switching between :instance_exists and :instance_running trying to get my system working, so I must have seen UnexpectedError wrapping the InvalidInstanceIDNotFound error when using the :instance_running waiter. This was my incorrect assumption about the waiters - I though that :instance_running would retry the NotFound error the same way the :instance_exists waiter did. #themoreyouknow

But I still see my original issue - the :instance_exists waiter has returned successfully (meaning that the #describe_instances call returned successfully). However, I then try to tag the instance and I get the InvalidInstanceIDNotFound error.

I understand that the instance may not be taggable yet, but it still seems very weird that it is returning a NotFound error when the #describe_instances call was just successful.

trevorrowe added a commit that referenced this issue Aug 4, 2015
The `:instance_running` and `:instance_state_ok` waiters were
raising an unexpeced error, i.e. `Aws::EC2::Errors::InvalidInstanceIDNotFound`
instead of retrying. This could happen only newly launched instances.

See #859.
@trevorrowe
Copy link
Member

Good catch. I've added logic to the :instance_running and :instance_state_ok waiters to retry the instance id not found errors. This should improve the experience for polling on newly launched instances.

As for the originally reported issue, it seems unfortunate that there is not an obvious way, via the Amazon EC2 API to know when an instance may be tagged. I'm going to play around with the DescribeTags API operation and see if that might provide more information.

@tyler-ball
Copy link
Author

@trevorrowe Thanks so much for all the support!

@trevorrowe
Copy link
Member

At this point, I'm inclined to close this issue. I will certainly watch out for the opportunity to improve this, but currently there appears to be no deterministic way to wait until the instance is tagable without simply trying, rescuing the error and retrying until the system is internally consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved.
Projects
None yet
Development

No branches or pull requests

4 participants