-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent Running But Disconnected #781
Comments
Continuing to see this behavior repeatedly. Looks like if left alone the agent reconnects within about 2-5 minutes. |
Hi @tweibley, as I recently commented on #730,
If you're seeing sustained long disconnection periods, can you please share the full Agent logs to help us debug the root-cause for the same? You can use ECS Logs Collector and share the artifact with us by emailing them to aithal at amazon dot com. Thanks, |
I'm happy to collect some logs for you. In the example logs I provided above is the timeline really representative of the rate at which you expect those disconnects to occur (to force re-auth)? |
It's hard to tell, because the following lines could also be coming from 2 different websocket connections:
ECS Agent establishes two connections, one with the backend communication service, other with the metrics endpoint and both of them follow the same pattern. It's hard to tell if the first log line is coming from the metrics connection or the communication service connection and have complete Agent logs helps in that respect. Thanks again for being willing to share the logs. |
Hi @tweibley, thank you for sending those logs. They were extremely helpful and provide some indications as to why you're seeing sporadic long disconnection times from the backend Communication Service. After correlating the Agent logs for the two Container Instances with our backend wire logs, I see 3 distinct occurrences of Agent remaining disconnected from the backend >
I'm marking this as a bug and will update the thread when I have more details to share. Thanks, |
We just ran into this issue with the 1.14.2 agent. Here are both the ecs-init and ecs-agent logs from around this time. ---- ecs-init.log ---------------------------------------------------------------------------- ---- ecs-agent.log -------------------------------------------------------------------------- 2017-06-12T20:07:47Z [INFO] Starting Agent: Amazon ECS Agent - v1.14.2 (35acca3) |
Hi @samuelkarp, 2017-07-07T17:06:08Z [INFO] TaskHandler, Sending task change: TaskChange: arn:aws:ecs:eu-west-1:136432914479:task/90369435-dc1a-4b5d-a21e-608d9369266e -> STOPPED, Known Sent: NONE |
Hey @abozhinov - What do the agent logs show after the If you are seeing extended periods of disconnection, please collect full agent logs using the ECS logs collector and email them to me at adnkha at amazon dot com. Thanks! |
@abozhinov @bpuglisi @tweibley We have improved this in the new released agent v1.14.4, please upgrade to the new agent version. I'm closing this for now, feel free to reopen if you run into this issue in the future. |
We keep seeing various agents go awol (agent is active but it's not connected anymore). I caught one in that state tonight and got a dump from it, with a few relevant log entries too: https://gist.github.com/tweibley/1e7c8ecacd15f813f276c62c08472375.
The only other thing I noticed is that we have a bunch of log entries like:
So far I haven't been able to figure out exactly why this is happening... Doesn't look like we are anywhere near max open files soft/hard limit, etc.
The text was updated successfully, but these errors were encountered: