separated logic for warm pools polling scenarios and do not fail on t… #3055

lydiafilipe · 2021-10-07T23:25:14Z

Commit: separated logic for warm pools polling scenarios and do not fail on throttling or transient server errors once state obtained

Summary

EC2 instances could theoretically be in a warm pool indefinitely. These changes are to address the possibility that at some point while polling, the agent might be throttled or IMDS could be experiencing issues. These should not cause the agent to fail.
This changes the behavior so that once the target lifecycle state has been obtained, errors that are likely transient do not cause failure. This includes throttling and certain 5xx errors. However, before the state has been obtained, all errors except 404 errors will still cause failure.
Retry number for querying IMDS has also been updated from 3 to 5.

Implementation details

Separated polling for the first published value and the subsequent polling waiting for it to be in service.
Modified the waitUntilInstanceInService method to continue polling for certain errors once target state obtained.
Added separate method pollUntilLifecycleStateObtainedfor the initial polling for any value.

Testing

Ran unit tests with changes and added new tests
Ran code on EC2 and verified polling occurred in logs

Description for the changelog

Separated logic for warm pools polling scenarios and do not fail on throttling or transient server errors once state obtained

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

angelcar · 2021-10-08T00:18:56Z

agent/app/agent.go

+		return err
+	}
+	// Poll while the instance is in a warmed state until it is going to go into service
+	for targetState != "InService" {


nit: can we use inServiceState here?

angelcar · 2021-10-08T00:25:28Z

agent/app/agent.go

+	// Poll while the instance is in a warmed state until it is going to go into service
+	for targetState != "InService" {
+		time.Sleep(pollWaitDuration)
+		targetState, err = agent.getTargetLifecycle(maxRetries)


I think the retries make sense for the first stage (i.e. pollUntilTargetLifecyclePresent), but what do you think of invoking this function like agent.getTargetLifecycle(1) at this point, since this loop will retry anyways.

Good point. The main advantage I would see is the situation where we might get an errors on the API call when the instance becomes ready, and then delay the set up by a number of minutes. That would be an edge case, though I think would not be a great experience if it were to occur, so I would be inclined to keep more than one retry.

That said, thinking about it, I don't think the increase in retry count is really necessary, it should be fine at 3. In the second stage, we will retry anyway, and in the first stage we wouldn't expect throttling errors to be likely and wouldn't need additional retries in the failure scenarios

…hrottling or transient server errors once state obtained

…hrottling or transient server errors once state obtained (aws#3055) Co-authored-by: Lydia Filipe <fillydia@amazon.com>

lydiafilipe added the bot/test label Oct 7, 2021

amazon-ecs-bot removed the bot/test label Oct 7, 2021

angelcar reviewed Oct 8, 2021

View reviewed changes

separated logic for warm pools polling scenarios and do not fail on t…

e9c65fa

…hrottling or transient server errors once state obtained

lydiafilipe force-pushed the feature/warm_pools branch from f079fbe to e9c65fa Compare October 12, 2021 17:35

lydiafilipe added the bot/test label Oct 12, 2021

amazon-ecs-bot removed the bot/test label Oct 12, 2021

angelcar approved these changes Oct 12, 2021

View reviewed changes

sharanyad approved these changes Oct 12, 2021

View reviewed changes

lydiafilipe merged commit 3445809 into aws:feature/warm_pools Oct 13, 2021

lydiafilipe mentioned this pull request Feb 2, 2022

Warm Pools Support #3123

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

separated logic for warm pools polling scenarios and do not fail on t… #3055

separated logic for warm pools polling scenarios and do not fail on t… #3055

lydiafilipe commented Oct 7, 2021

angelcar Oct 8, 2021

angelcar Oct 8, 2021

lydiafilipe Oct 11, 2021

separated logic for warm pools polling scenarios and do not fail on t… #3055

separated logic for warm pools polling scenarios and do not fail on t… #3055

Conversation

lydiafilipe commented Oct 7, 2021

Summary

Implementation details

Testing

Description for the changelog

Licensing

angelcar Oct 8, 2021

Choose a reason for hiding this comment

angelcar Oct 8, 2021

Choose a reason for hiding this comment

lydiafilipe Oct 11, 2021

Choose a reason for hiding this comment