-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ECS agent keeps trying to fetch stats for non existent container #478
Comments
@GeyseR Thanks for reporting this issue, we're aware of this issue and working on fixing it. Will let you know when we have an update. As a temporary fix, you can restart the agent by |
What about downgrading ecs agent to lower version? (by using old ECS Optimized AMIs) |
@GeyseR This is an issue existed in all agent version. Currently restart agent is the only solution of this issue. We are working on fixing it, will let you know when we have an update. |
Might this be related? We are seeing this message in the agent log:
The named container does exist, though. This would not be a problem in itself, but whenever this message logged, the agent keeps one additional socket connection open, until it eventually hits the maximum of 1024 FDs. This started to happen with the latest Shall I rather open a new issue for this? |
@martinrehfeld Thanks for letting us know, this is caused by the same reason in the agent. So you don't need to reopen one for this issue. We are working on a fix, will let you know when we have an update. |
Hi @richardpen! Do you have any estimate date for solving this issue? This become most critical issue for our system, because almost every deployment lead to downtime. |
Is it advisable to just periodically restart ecs with a cron job until this is fixed? |
Yes, this might be a temporary workaround, but yesterday i had problem with starting ecs agent container after stopping ecs service. Previous container just hung with Removal in progress status and it didn't allow to start ecs service again. |
@GeyseR @lpetre @martinrehfeld We have already had a pull request #482 for this. If you'd like to use a pre-build agent version in your instance before we release our AMI, please send me an email at: penyin (at) amazon.com. Thanks, |
We've just released 1.12.1, which should fix this issue. Please let us know if you continue to run into problems. |
@samuelkarp is there an ECS AMI associated with this release or do you recommend customers manually update the agent on each host? |
The new ECS AMI is |
Thanks, I'll be watching http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html for AMI details. |
@jhovell The documentation has been updated with new AMI IDs. |
@samuelkarp Don't forget the Marketplace page :) |
@ziggythehamster We're coordinating with the Marketplace team to get the listing updated there. |
Hi, AWS team. Unfortunately we had same problem on latest ECS-Optimized AMI with ecs agent v1.12.1 installed. Looks like this issue need to be reopened. |
@GeyseR It's still possible to see few error messages in the logs, as the container stop event may be handled right after collecting metrics(every second). But the agent shouldn't keep fetching metrics for the same stopped container. Did you see the same container id in the error logs? If you see the same container id has the error, could you send me the agent logs here or penyin (at) amazon.com? |
Hi @richardpen! Unfortunately we have exact problem with high CPU load during long period of time and a lot of identical log messages (docker+ecs). Not sure which logs do you want to receive.
Here is count of lines with identical messages from one of log files:
Let me know if you need more info... |
@GeyseR Can you open a case with AWS Support? It sounds like there might be something going on that's specific to your setup and we'd like to dig in with you in a setting where we can discuss the specifics of your situation. |
* Include amazon-ecs-volume-plugin and startup scripts in Debian Package (aws#450) * add amazon-ecs-volume-plugin to rpm generic package (aws#462) * Fix the issue that potentially curl target not present in bucket during release * Update copyrights Co-authored-by: Dennis Conrad <dennis.conrad@sainsburys.co.uk>
Hi!
We have weird problem with ecs-agent on some of our ecs instances inside clusters.
Ecs agent indefinitely keep trying to fetch stats for non existent container.
It spams in ecs log with messages:
(more that 754Mb in a hour)
... and in docker logs with
(near to 1Gb in a hour)
Also docker + agent processes consume near to 100% CPU.
We have 1.11.0 agent on one machine and latest 1.11.1 agent on another (in another cluster)
Only change that we have in agent settings - our ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION parameter was set to 15m
Looks like very critical issue (at least for our company).
How we can avoid this? or will it be fixed?
The text was updated successfully, but these errors were encountered: