You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have started seeing this issue intermittently in the latest agent release. I'm wondering if this is a timeout misconfiguration rather than a real bug - our images are quite large and it's not inconceivable for an image pull to take 5+ minutes.
Any guidance would be helpful. Thanks!
2016-10-30T04:59:16Z [INFO] Created docker container for task pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->RUNNING) Containers: [gpu_compute_node2 (CREATED->RUNNING),gpu_compute_node3 (CREATED->RUNNING),gpu_compute_node (CREATED->RUNNING),gpu_compute_node4 (CREATED->RUNNING),]: gpu_compute_node(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (CREATED->RUNNING) -> 446581d35eebc464156016c6dadc23175c8d3174dfe1fa34c025ba3d118e297e
2016-10-30T04:59:16Z [INFO] Starting container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->RUNNING) Containers: [gpu_compute_node2 (CREATED->RUNNING),gpu_compute_node3 (CREATED->RUNNING),gpu_compute_node (CREATED->RUNNING),gpu_compute_node4 (CREATED->RUNNING),]" container="gpu_compute_node4(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (CREATED->RUNNING)"
2016-10-30T04:59:16Z [INFO] Starting container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->RUNNING) Containers: [gpu_compute_node2 (CREATED->RUNNING),gpu_compute_node3 (CREATED->RUNNING),gpu_compute_node (CREATED->RUNNING),gpu_compute_node4 (CREATED->RUNNING),]" container="gpu_compute_node2(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (CREATED->RUNNING)"
2016-10-30T04:59:16Z [INFO] Starting container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->RUNNING) Containers: [gpu_compute_node2 (CREATED->RUNNING),gpu_compute_node3 (CREATED->RUNNING),gpu_compute_node (CREATED->RUNNING),gpu_compute_node4 (CREATED->RUNNING),]" container="gpu_compute_node3(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (CREATED->RUNNING)"
2016-10-30T04:59:16Z [INFO] Starting container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->RUNNING) Containers: [gpu_compute_node2 (CREATED->RUNNING),gpu_compute_node3 (CREATED->RUNNING),gpu_compute_node (CREATED->RUNNING),gpu_compute_node4 (CREATED->RUNNING),]" container="gpu_compute_node(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (CREATED->RUNNING)"
2016-10-30T04:59:17Z [INFO] Saving state! module="statemanager"
2016-10-30T04:59:36Z [INFO] Error transitioning container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->RUNNING) Containers: [gpu_compute_node2 (CREATED->RUNNING),gpu_compute_node3 (CREATED->RUNNING),gpu_compute_node (CREATED->RUNNING),gpu_compute_node4 (CREATED->RUNNING),]" container="gpu_compute_node2(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (CREATED->RUNNING)" state="RUNNING"
2016-10-30T04:59:36Z [WARN] Error with docker; stopping container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->RUNNING) Containers: [gpu_compute_node2 (RUNNING->RUNNING),gpu_compute_node3 (CREATED->RUNNING),gpu_compute_node (CREATED->RUNNING),gpu_compute_node4 (CREATED->RUNNING),]" container="gpu_compute_node2(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->RUNNING)" err="API error (500): failed to create endpoint ecs-pre_production_gpu_compute_node-13-gpucomputenode2-dac8ccb3988fe0fd6a00 on network bridge: failed to save bridge endpoint 649374f to store: failed to update bridge store for object type *bridge.bridgeEndpoint: timeout
"
2016-10-30T04:59:37Z [INFO] Saving state! module="statemanager"
2016-10-30T04:59:37Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node4 -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node4 -> RUNNING, Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node4 -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node3 -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Redundant container state change for task pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (CREATED->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (CREATED->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]: gpu_compute_node4(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED) to RUNNING, but already RUNNING
2016-10-30T04:59:37Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node3 -> RUNNING, Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node3 -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Task change event module="TaskEngine" event="{TaskArn:arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 Status:RUNNING Reason: SentStatus:NONE}"
2016-10-30T04:59:37Z [INFO] Redundant container state change for task pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]: gpu_compute_node(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED) to RUNNING, but already RUNNING
2016-10-30T04:59:37Z [INFO] Stopping container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]" container="gpu_compute_node2(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED)"
2016-10-30T04:59:37Z [INFO] Stopping container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]" container="gpu_compute_node(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED)"
2016-10-30T04:59:37Z [INFO] Adding event module="eventhandler" change="TaskChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Stopping container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]" container="gpu_compute_node3(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED)"
2016-10-30T04:59:37Z [INFO] Redundant container state change for task pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]: gpu_compute_node3(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED) to RUNNING, but already RUNNING
2016-10-30T04:59:37Z [INFO] Stopping container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]" container="gpu_compute_node4(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED)"
2016-10-30T04:59:37Z [INFO] Error transitioning container module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (RUNNING->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]" container="gpu_compute_node2(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (RUNNING->STOPPED)" state="STOPPED"
2016-10-30T04:59:37Z [INFO] Error for 'docker stop' of container; assuming it's stopped anyways module="TaskEngine" task="pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (STOPPED->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (RUNNING->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]"
2016-10-30T04:59:37Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node -> RUNNING, Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node2 -> STOPPED, Reason CannotStartContainerError: API error (500): failed to create endpoint ecs-pre_production_gpu_compute_node-13-gpucomputenode2-dac8ccb3988fe0fd6a00 on network bridge: failed to save bridge endpoint 649374f to store: failed to update bridge store for object type *bridge.bridgeEndpoint: timeout
, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Sending task change module="eventhandler" event="TaskChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 -> RUNNING, Known Sent: NONE" change="TaskChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 -> RUNNING, Known Sent: NONE"
2016-10-30T04:59:37Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node2 -> STOPPED, Reason CannotStartContainerError: API error (500): failed to create endpoint ecs-pre_production_gpu_compute_node-13-gpucomputenode2-dac8ccb3988fe0fd6a00 on network bridge: failed to save bridge endpoint 649374f to store: failed to update bridge store for object type *bridge.bridgeEndpoint: timeout
, Known Sent: NONE" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node2 -> STOPPED, Reason CannotStartContainerError: API error (500): failed to create endpoint ecs-pre_production_gpu_compute_node-13-gpucomputenode2-dac8ccb3988fe0fd6a00 on network bridge: failed to save bridge endpoint 649374f to store: failed to update bridge store for object type *bridge.bridgeEndpoint: timeout
, Known Sent: NONE"
2016-10-30T04:59:47Z [INFO] Saving state! module="statemanager"
2016-10-30T05:00:07Z [INFO] Error retrieving stats for container 446581d35eebc464156016c6dadc23175c8d3174dfe1fa34c025ba3d118e297e: context canceled
2016-10-30T05:00:07Z [INFO] Container 446581d35eebc464156016c6dadc23175c8d3174dfe1fa34c025ba3d118e297e is terminal, stopping stats collection
2016-10-30T05:00:07Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node -> STOPPED, Exit 137, , Known Sent: RUNNING"
2016-10-30T05:00:07Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node -> STOPPED, Exit 137, , Known Sent: RUNNING" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node -> STOPPED, Exit 137, , Known Sent: RUNNING"
2016-10-30T05:00:07Z [INFO] Saving state! module="statemanager"
2016-10-30T05:00:07Z [INFO] Redundant container state change for task pre_production_gpu_compute_node:13 arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137, Status: (RUNNING->STOPPED) Containers: [gpu_compute_node2 (STOPPED->STOPPED),gpu_compute_node3 (RUNNING->STOPPED),gpu_compute_node (STOPPED->STOPPED),gpu_compute_node4 (RUNNING->STOPPED),]: gpu_compute_node(943938684455.dkr.ecr.us-west-2.amazonaws.com/umaptechnologies/gpu_compute_node:pre_production) (STOPPED->STOPPED) - Exit: 137 to STOPPED, but already STOPPED
2016-10-30T05:00:07Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node4 -> STOPPED, Exit 137, , Known Sent: RUNNING"
2016-10-30T05:00:07Z [INFO] Sending container change module="eventhandler" event="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node4 -> STOPPED, Exit 137, , Known Sent: RUNNING" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node4 -> STOPPED, Exit 137, , Known Sent: RUNNING"
2016-10-30T05:00:07Z [INFO] Container 24f96681c9076f379cbfe9952d3ca12e597459ab24afb121558dc2278c3f252d is terminal, stopping stats collection
2016-10-30T05:00:07Z [INFO] Error retrieving stats for container 24f96681c9076f379cbfe9952d3ca12e597459ab24afb121558dc2278c3f252d: context canceled
2016-10-30T05:00:07Z [INFO] Task change event module="TaskEngine" event="{TaskArn:arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 Status:STOPPED Reason: SentStatus:RUNNING}"
2016-10-30T05:00:07Z [INFO] Error retrieving stats for container 0cb648a6fea87b3b493d5b315d0db97be5c1ed983fc56a32fa4b4ad8337242ad: context canceled
2016-10-30T05:00:07Z [INFO] Container 0cb648a6fea87b3b493d5b315d0db97be5c1ed983fc56a32fa4b4ad8337242ad is terminal, stopping stats collection
2016-10-30T05:00:07Z [INFO] Adding event module="eventhandler" change="ContainerChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 gpu_compute_node3 -> STOPPED, Exit 137, , Known Sent: RUNNING"
2016-10-30T05:00:07Z [INFO] Adding event module="eventhandler" change="TaskChange: arn:aws:ecs:us-west-2:943938684455:task/07ed4685-9d02-4885-a58b-c1c108558137 -> STOPPED, Known Sent: RUNNING"
The text was updated successfully, but these errors were encountered:
@mcmonster Thanks for your reporting, this seems like a issue with docker and has been reported here, which is related to moby/libnetwork#1374. And a possible fix has been created here.
@mcmonster The fix @richardpen referenced is included in the Docker 1.12.6 version now available in Amazon Linux. Can you test the most recent ECS-optimized AMI and let us know if you're still running into issues?
@mcmonster We haven't heard back from you in a while, so I'm going to close this issue for now. Please let us know if the problem recurs and we can look again.
I have started seeing this issue intermittently in the latest agent release. I'm wondering if this is a timeout misconfiguration rather than a real bug - our images are quite large and it's not inconceivable for an image pull to take 5+ minutes.
Any guidance would be helpful. Thanks!
The text was updated successfully, but these errors were encountered: