Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StartContainer edge case #1708

Closed
petderek opened this issue Nov 29, 2018 · 7 comments
Closed

StartContainer edge case #1708

petderek opened this issue Nov 29, 2018 · 7 comments

Comments

@petderek
Copy link
Contributor

Summary

We've observed that occasionally, go-dockerclient will return an EOF instead of a complete response from Docker. Our StartContainer logic does not account for this, and the task gets marked as 'failed to start' when in reality the containers may be running. Agent should be modified to check for this failure state.

Expected Behavior

Either:

  1. Containers do not start, task gets marked as failed.

or

  1. Containers start, agent checks if they actually started, task marks as success.

Observed Behavior

  • Docker API returns invalid response
  • Docker daemon starts container
  • Agent marks task as failed to start
  • Container runs anyway, agent does not track or stop it

Supporting Log Snippets

Logs will contain messaging similar to:

error transitioning container [<container_name>] to [RUNNING]: Post http://unix.sock/v1.19/containers/<container_id>/start EOF 
@mfortin
Copy link

mfortin commented Jan 8, 2019

Faced this bug again today. Is there an ETA ?

@yhlee-aws
Copy link
Contributor

Latest agent release (v1.24.0) has replaced go-dockerclient with Docker's official SDK. @mfortin is upgrading to the latest release an option for you?

@mfortin
Copy link

mfortin commented Jan 13, 2019

@yunhee-l It is. I will be testing it out starting Monday. Thanks!

@mfortin
Copy link

mfortin commented Jan 15, 2019

@yunhee-l I did. Now this is new:

CannotStartContainerError: error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.19/containers/eaeca3d1c68c067c0963a2a48785a65b1b5961acd9c8c6f72c7e316afef11442/start: EOF

@mfortin
Copy link

mfortin commented Jan 15, 2019

@yunhee-l I have grabbed the logs using ecs-logs-collector if you want to have a look.

@adnxn
Copy link
Contributor

adnxn commented Jan 15, 2019

I have grabbed the logs using ecs-logs-collector if you want to have a look.

@mfortin: mind sending the logs to adnkha at amazon dot com?

@adnxn
Copy link
Contributor

adnxn commented Nov 1, 2019

closing, released with version v1.32.1

@adnxn adnxn closed this as completed Nov 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants