Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't update a service that is in UPDATE_ROLLBACK_FAILED mode #4333

Closed
dmathewwws opened this issue Jan 8, 2023 · 5 comments
Closed

Can't update a service that is in UPDATE_ROLLBACK_FAILED mode #4333

dmathewwws opened this issue Jan 8, 2023 · 5 comments
Labels
guidance Issue requesting guidance or information about usage

Comments

@dmathewwws
Copy link

dmathewwws commented Jan 8, 2023

I have a service called api that has 2 stacks: api & api-AddonsStack (One of the addons is an OpenSearch DB).

I had an OpenSearch DB with an EngineVersion of 1.3, I then tried to upgrade to Opensearch EngineVersion 2.3 (using copilot inside the yml file). My OpenSearch has upgraded to 2.3, but I did get a UPDATE_ROLLBACK_FAILED for both stacks.

When I got to stacks section in the AWS Console this is the details I get for the UPDATE_ROLLBACK_FAILED error:
Embedded stack arn:aws:...api-AddonsStack was not successfully updated. Currently in UPDATE_ROLLBACK_FAILED with reason: The following resource(s) failed to update: [ElasticsearchDomain].

More specifically when I go to the Resource, it gives this error:
Resource handler returned message: "Error occurred during operation 'Failed to submit upgrade. Upgrade from OpenSearch_2.3 to OpenSearch_1.0 not supported.

When I go to the Template section for the Addon stack, I unexpectedly do see:

EngineVersion: 'OpenSearch_1.0'

even though that is not what my yml file says

EngineVersion: 'OpenSearch_2.3'

Furthermore, when I view drift results, I see Elasticsearch as the service that isn't matching:

ElasticsearchDomain | x | AWS::OpenSearchService::Domain | MODIFIED | 2023-01-08 10:11:12 UTC-0800 

1 tried 2 solutions:

  1. I tried to add a change set to this Addon stack (with the update of EngineVersion to 2.3), but I get the following error when I try to do so:

Stack:arn:aws:...api-AddonsStack is in UPDATE_ROLLBACK_FAILED state and can not be updated.

  1. I tried to run the aws-cli command
aws cloudformation continue-update-rollback --stack-name api-AddonsStack --resources-to-skip ElasticsearchDomain

but get the following error: An error occurred (ValidationError) when calling the ContinueUpdateRollback operation: RollbackUpdatedStack cannot be invoked on child stacks

I also tried

aws cloudformation continue-update-rollback --stack-name api --resources-to-skip AddonsStack

but get the following error: Nested stack resources can only be skipped when their embedded stack statuses are one of [DELETE_COMPLETE, DELETE_IN_PROGRESS, DELETE_FAILED]

@Lou1415926
Copy link
Contributor

Hello @dmathewwws 👋🏼! I can help you with that. Before I provide my recommendations, I just want to make sure a few things:

  1. Is the ElasticsearchDomain the only resource that is currently in UPDATE_FAILED (note that this is not UPDATE_ROLLBACK_FAILED)?
  2. Is ElasticsearchDomain a resource of type OpenSearch:Domain?
    3.I'm interested in the events that trigger your stack to rollback in the first place. Can you look at your addon stack, and look for the last resource in UPDATE_FAILED state before the whole stack went into UPDATE_IN_PROGRESS?

It will be easier if you can share a screenshot of the event logs, but I understand if you prefer not to do so.

@Lou1415926 Lou1415926 added the guidance Issue requesting guidance or information about usage label Jan 9, 2023
@dmathewwws
Copy link
Author

dmathewwws commented Jan 9, 2023

Hey @Lou1415926 thanks for the response.

  1. Yes, its the only resource in 'UPDATE_FAILED' state
  2. Yes, its of type 'AWS::OpenSearchService::Domain'
  3. These are the events in order:
    2023-01-07 16:29:00 UTC-0800 | ElasticsearchDomain | UPDATE_IN_PROGRESS | -
    --
    2023-01-07 18:00:29 UTC-0800 | ElasticsearchDomain | UPDATE_FAILED | Resource handler returned message: "Exceeded attempts to wait"
    --
    2023-01-07 18:00:31 UTC-0800 | api-AddonsStack| UPDATE_ROLLBACK_IN_PROGRESS | The following resource(s) failed to update: [ElasticsearchDomain].
    --
    2023-01-07 18:03:00 UTC-0800 | ElasticsearchDomain | UPDATE_IN_PROGRESS | -
    --
    2023-01-07 18:03:01 UTC-0800 | ElasticsearchDomain | UPDATE_FAILED | Resource handler returned message: "Error occurred during operation 'Failed to submit upgrade. Upgrade from OpenSearch_1.3 to OpenSearch_1.0 not supported.
    --
    2023-01-07 18:03:02 UTC-0800 | api-AddonsStack | UPDATE_ROLLBACK_FAILED | The following resource(s) failed to update: [ElasticsearchDomain].

@Lou1415926
Copy link
Contributor

To get your stack back to an updatable state, we need to skip the rollback of ElasticsearchDomain:

aws cloudformation continue-update-rollback \
  --stack-name <app>-<env>-api \
  --resources-to-skip <app>-<env>-AddonsStack-<random string>. ElasticsearchDomain

This may look similar to what you attempted here:

I tried to add a change set to this Addon stack (with the update of EngineVersion to 2.3), but I get the following error when I try to do so:

However, instead of skipping the whole addon stack, this is skipping only the ElasticsearchDomain resource in the addon stack.

After running this command, your service stack and the addons stack should hopefully be in the UPDATE_ROLLBACK_COMPLETE state. At this point, all resources are rolled back, except for ElasticsearchDomain, which should still have OpenSearch_1.3 for EngineVersion.

@Lou1415926
Copy link
Contributor

I see the event that triggered the rollback was:

2023-01-07 18:00:29 UTC-0800 | ElasticsearchDomain | UPDATE_FAILED | Resource handler returned message: "Exceeded attempts to wait"

Do you know what is causing it?

I was looking at this documentation, it seems like:

  1. When you have DomainName field, you can't perform a replacement.
  2. When you don't set EnableVersionUpgrade to true, a change of EngineVersion results in a replacement.

Does this sound like what's happening in your case?

Sorry if I'm getting a little ahead 🙇🏼‍♀️ !!

@dmathewwws
Copy link
Author

yes!

I do have a domain name set and I didn't set EnableVersionUpgrade.

Thanks for your help @Lou1415926. My service is up to date and running now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Issue requesting guidance or information about usage
Projects
None yet
Development

No branches or pull requests

2 participants