Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to update bridge store for object type *bridge.bridgeEndpoint: timeout" #574

Closed
mcmonster opened this issue Oct 30, 2016 · 3 comments

Comments

@mcmonster
Copy link

I have started seeing this issue intermittently in the latest agent release. I'm wondering if this is a timeout misconfiguration rather than a real bug - our images are quite large and it's not inconceivable for an image pull to take 5+ minutes.

Any guidance would be helpful. Thanks!

2016-10-30T04:59:16Z [INFO] Created docker container for task pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->RUNNING) Containers: [gpu_compute_node2 (CREATED->RUNNING),gpu_compute_node3 (CREATED->RUNNING),gpu_compute_node (CREATED->RUNNING),gpu_compute_node4 (CREATED->RUNNING),]: gpu_compute_node(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (CREATED->RUNNING) -> 446581d35eebc464156016c6dadc23175c8d3174dfe1fa34c025ba3d118e297e
2016-10-30T04:59:16Z [INFO] Starting container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->RUNNING) Containers: [gpu_compute_node2 (CREATED->RUNNING),gpu_compute_node3 (CREATED->RUNNING),gpu_compute_node (CREATED->RUNNING),gpu_compute_node4 (CREATED->RUNNING),]" container="gpu_compute_node4(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (CREATED->RUNNING)"
2016-10-30T04:59:16Z [INFO] Starting container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->RUNNING) Containers: [gpu_compute_node2 (CREATED->RUNNING),gpu_compute_node3 (CREATED->RUNNING),gpu_compute_node (CREATED->RUNNING),gpu_compute_node4 (CREATED->RUNNING),]" container="gpu_compute_node2(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (CREATED->RUNNING)"
2016-10-30T04:59:16Z [INFO] Starting container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->RUNNING) Containers: [gpu_compute_node2 (CREATED->RUNNING),gpu_compute_node3 (CREATED->RUNNING),gpu_compute_node (CREATED->RUNNING),gpu_compute_node4 (CREATED->RUNNING),]" container="gpu_compute_node3(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (CREATED->RUNNING)"
2016-10-30T04:59:16Z [INFO] Starting container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->RUNNING) Containers: [gpu_compute_node2 (CREATED->RUNNING),gpu_compute_node3 (CREATED->RUNNING),gpu_compute_node (CREATED->RUNNING),gpu_compute_node4 (CREATED->RUNNING),]" container="gpu_compute_node(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (CREATED->RUNNING)"
2016-10-30T04:59:17Z [INFO] Saving state! module="statemanager"
2016-10-30T04:59:36Z [INFO] Error transitioning container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->RUNNING) Containers: [gpu_compute_node2 (CREATED->RUNNING),gpu_compute_node3 (CREATED->RUNNING),gpu_compute_node (CREATED->RUNNING),gpu_compute_node4 (CREATED->RUNNING),]" container="gpu_compute_node2(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (CREATED->RUNNING)" state="RUNNING"
2016-10-30T04:59:36Z [WARN] Error with docker; stopping container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->RUNNING) Containers: [gpu_compute_node2 (RUNNING->RUNNING),gpu_compute_node3 (CREATED->RUNNING),gpu_compute_node (CREATED->RUNNING),gpu_compute_node4 (CREATED->RUNNING),]" container="gpu_compute_node2(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->RUNNING)" err="API error (500): failed to create endpoint ecs-pre_production_gpu_compute_node-13-gpucomputenode2-dac8ccb3988fe0fd6a00 on network bridge: failed to save bridge endpoint 649374f to store: failed to update bridge store for object type *bridge.bridgeEndpoint: timeout
"
2016-10-30T04:59:37Z [INFO] Saving state! module="statemanager"
2016-10-30T04:59:37Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node4 -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node4 -> RUNNING, Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node4 -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node3 -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Redundant container state change for task pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (CREATED->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]: gpu_compute_node4(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED) to RUNNING, but already RUNNING
2016-10-30T04:59:37Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node3 -> RUNNING, Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node3 -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Task change event module="TaskEngine" event="{TaskArn:arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 Status:RUNNING Reason: SentStatus:NONE}"
2016-10-30T04:59:37Z [INFO] Redundant container state change for task pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]: gpu_compute_node(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED) to RUNNING, but already RUNNING
2016-10-30T04:59:37Z [INFO] Stopping container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]" container="gpu_compute_node2(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED)"
2016-10-30T04:59:37Z [INFO] Stopping container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]" container="gpu_compute_node(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED)"
2016-10-30T04:59:37Z [INFO] Adding event module="eventhandler" change="TaskChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Stopping container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]" container="gpu_compute_node3(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED)"
2016-10-30T04:59:37Z [INFO] Redundant container state change for task pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]: gpu_compute_node3(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED) to RUNNING, but already RUNNING
2016-10-30T04:59:37Z [INFO] Stopping container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]" container="gpu_compute_node4(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED)"
2016-10-30T04:59:37Z [INFO] Error transitioning container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]" container="gpu_compute_node2(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED)" state="STOPPED"
2016-10-30T04:59:37Z [INFO] Error for 'docker stop' of container; assuming it's stopped anyways module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (STOPPED->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]"
2016-10-30T04:59:37Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node -> RUNNING, Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node2 -> STOPPED, Reason CannotStartContainerError: API error (500): failed to create endpoint ecs-pre_production_gpu_compute_node-13-gpucomputenode2-dac8ccb3988fe0fd6a00 on network bridge: failed to save bridge endpoint 649374f to store: failed to update bridge store for object type *bridge.bridgeEndpoint: timeout
, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Sending task change module="eventhandler" event="TaskChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 -> RUNNING, Known Sent: NONE" change="TaskChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node2 -> STOPPED, Reason CannotStartContainerError: API error (500): failed to create endpoint ecs-pre_production_gpu_compute_node-13-gpucomputenode2-dac8ccb3988fe0fd6a00 on network bridge: failed to save bridge endpoint 649374f to store: failed to update bridge store for object type *bridge.bridgeEndpoint: timeout
, Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node2 -> STOPPED, Reason CannotStartContainerError: API error (500): failed to create endpoint ecs-pre_production_gpu_compute_node-13-gpucomputenode2-dac8ccb3988fe0fd6a00 on network bridge: failed to save bridge endpoint 649374f to store: failed to update bridge store for object type *bridge.bridgeEndpoint: timeout
, Known Sent: NONE"
2016-10-30T04:59:47Z [INFO] Saving state! module="statemanager"
2016-10-30T05:00:07Z [INFO] Error retrieving stats for container 446581d35eebc464156016c6dadc23175c8d3174dfe1fa34c025ba3d118e297e: context canceled
2016-10-30T05:00:07Z [INFO] Container 446581d35eebc464156016c6dadc23175c8d3174dfe1fa34c025ba3d118e297e is terminal, stopping stats collection
2016-10-30T05:00:07Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node -> STOPPED, Exit 137, , Known Sent: RUNNING"
2016-10-30T05:00:07Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node -> STOPPED, Exit 137, , Known Sent: RUNNING" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node -> STOPPED, Exit 137, , Known Sent: RUNNING"
2016-10-30T05:00:07Z [INFO] Saving state! module="statemanager"
2016-10-30T05:00:07Z [INFO] Redundant container state change for task pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (STOPPED->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (STOPPED->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]: gpu_compute_node(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (STOPPED->STOPPED) - Exit: 137 to STOPPED, but already STOPPED
2016-10-30T05:00:07Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node4 -> STOPPED, Exit 137, , Known Sent: RUNNING"
2016-10-30T05:00:07Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node4 -> STOPPED, Exit 137, , Known Sent: RUNNING" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node4 -> STOPPED, Exit 137, , Known Sent: RUNNING"
2016-10-30T05:00:07Z [INFO] Container 24f96681c9076f379cbfe9952d3ca12e597459ab24afb121558dc2278c3f252d is terminal, stopping stats collection
2016-10-30T05:00:07Z [INFO] Error retrieving stats for container 24f96681c9076f379cbfe9952d3ca12e597459ab24afb121558dc2278c3f252d: context canceled
2016-10-30T05:00:07Z [INFO] Task change event module="TaskEngine" event="{TaskArn:arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 Status:STOPPED Reason: SentStatus:RUNNING}"
2016-10-30T05:00:07Z [INFO] Error retrieving stats for container 0cb648a6fea87b3b493d5b315d0db97be5c1ed983fc56a32fa4b4ad8337242ad: context canceled
2016-10-30T05:00:07Z [INFO] Container 0cb648a6fea87b3b493d5b315d0db97be5c1ed983fc56a32fa4b4ad8337242ad is terminal, stopping stats collection
2016-10-30T05:00:07Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node3 -> STOPPED, Exit 137, , Known Sent: RUNNING"
2016-10-30T05:00:07Z [INFO] Adding event module="eventhandler" change="TaskChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 -> STOPPED, Known Sent: RUNNING"
@richardpen
Copy link

@mcmonster Thanks for your reporting, this seems like a issue with docker and has been reported here, which is related to moby/libnetwork#1374. And a possible fix has been created here.

@samuelkarp
Copy link
Contributor

@mcmonster The fix @richardpen referenced is included in the Docker 1.12.6 version now available in Amazon Linux. Can you test the most recent ECS-optimized AMI and let us know if you're still running into issues?

@samuelkarp
Copy link
Contributor

@mcmonster We haven't heard back from you in a while, so I'm going to close this issue for now. Please let us know if the problem recurs and we can look again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants