Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS service creation intermittent failures with "Error: ECS service not created" #24565

Closed
1rjt opened this issue May 5, 2022 · 2 comments · Fixed by #25641
Closed

ECS service creation intermittent failures with "Error: ECS service not created" #24565

1rjt opened this issue May 5, 2022 · 2 comments · Fixed by #25641
Assignees
Labels
bug Addresses a defect in current functionality. eventual-consistency Pertains to eventual consistency issues. service/ecs Issues and PRs that pertain to the ecs service.
Milestone

Comments

@1rjt
Copy link

1rjt commented May 5, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

Terraform v1.0.10
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v4.12.1

Affected Resource(s)

  • aws_ecs_service

Terraform Configuration Files

Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.

# Happens intermittently on several different ecs_service definitions. Here is one (anonymized) example

resource "aws_ecs_service" "some_ecs_service" {
  name            = "some_ecs_service"
  cluster         = aws_ecs_cluster.some_cluster.name
  task_definition = aws_ecs_task_definition.some_task_definition.arn
  desired_count   = 2
  network_configuration {
      subnets = var.some_subnets
      security_groups = var.some_sg_ids
      assign_public_ip = false
  }
  capacity_provider_strategy {
    capacity_provider     = var.some_capacity_provider_name
    weight                = 10
    base                  = 1
  }
  load_balancer {
    target_group_arn = var.some_target_group_arn
    container_name   = "some_container"
    container_port   = 80
  }
}

Debug Output

Here is the anonymized debug output for the relevant resource.
https://gist.github.com/1rjt/bf4b303c9cab11e265775b41c0dffc15

Panic Output

Expected Behavior

ECS service created.

Actual Behavior

Terraform plan exits with error:

Error: ECS service not created: [ECS_SERVICE_ARN]

(snippet from debug output)

2022-05-05T02:15:01.282Z [DEBUG] provider.terraform-provider-aws_v4.12.1_x5: [aws-sdk-go] {"failures":[{"arn":"arn:aws:ecs:SOME-AWS-REGION:SOME-AWS-ACCOUNT-ID:service/some_ecs_service","reason":"MISSING"}],"services":[]}: timestamp=2022-05-05T02:15:01.282Z
2022-05-05T02:15:01.283Z [TRACE] maybeTainted: module.SOME-MODULE.module.some_file.aws_ecs_service.some_ecs_service[0] encountered an error during creation, so it is now marked as tainted

Steps to Reproduce

Only happens intermittently when doing 'terraform apply' on an ECS cluster with 14 services.

  1. terraform apply

Important Factoids

The problem seems to happen when creating an ECS cluster with 14 services and 10 capacity providers. We create/destroy such clusters many times per day and it only happens in approx 1 in 10 "terraform plan" calls.

When inspecting AWS CloudTrail I can see that the ECS CreateService call was received and returned a correct response. The Cloudtrail event response includes the cluster details plus "status": "ACTIVE", so the cluster all looks OK.

Then within the same second I can see a call to "DescribeServices" with the ARN of the service that caused the error as a request parameter. The "DescribeServices" call seems to get no results and causes the error.

Perhaps an eventual consistency race condition? Immediately calling "describe-service" after "create-service" not always guaranteed?

References

@github-actions github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/ecs Issues and PRs that pertain to the ecs service. labels May 5, 2022
@justinretzolk justinretzolk added bug Addresses a defect in current functionality. eventual-consistency Pertains to eventual consistency issues. and removed needs-triage Waiting for first response or review from a maintainer. labels May 5, 2022
@gdavison gdavison self-assigned this Jun 30, 2022
@github-actions github-actions bot added this to the v4.22.0 milestone Jul 4, 2022
@github-actions
Copy link

github-actions bot commented Jul 8, 2022

This functionality has been released in v4.22.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

github-actions bot commented Aug 7, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. eventual-consistency Pertains to eventual consistency issues. service/ecs Issues and PRs that pertain to the ecs service.
Projects
None yet
3 participants