ECS agent keeps trying to fetch stats for non existent container #478

GeyseR · 2016-08-11T15:58:37Z

Hi!

We have weird problem with ecs-agent on some of our ecs instances inside clusters.
Ecs agent indefinitely keep trying to fetch stats for non existent container.
It spams in ecs log with messages:

2016-08-11T15:35:35Z [WARN] Error retrieving stats for container a6366dc2480e5516eae7a91c2696a11e4b390e7005d9d867452a65be7e58233e: No such container: a6366dc2480e5516eae7a91c2696a11e4b390e7005d9d867452a65be7e58233e
2016-08-11T15:35:35Z [WARN] Error retrieving stats for container a6366dc2480e5516eae7a91c2696a11e4b390e7005d9d867452a65be7e58233e: No such container: a6366dc2480e5516eae7a91c2696a11e4b390e7005d9d867452a65be7e58233e
2016-08-11T15:35:35Z [WARN] Error retrieving stats for container a6366dc2480e5516eae7a91c2696a11e4b390e7005d9d867452a65be7e58233e: No such container: a6366dc2480e5516eae7a91c2696a11e4b390e7005d9d867452a65be7e58233e

(more that 754Mb in a hour)
... and in docker logs with

time="2016-08-11T15:48:09.729334949Z" level=error msg="Handler for GET /v1.17/containers/a6366dc2480e5516eae7a91c2696a11e4b390e7005d9d867452a65be7e58233e/stats returned error: No such container: a6366dc2480e5516eae7a91c2696a11e4b390e7005d9d867452a65be7e58233e" 
time="2016-08-11T15:48:09.729642368Z" level=error msg="Handler for GET /v1.17/containers/a6366dc2480e5516eae7a91c2696a11e4b390e7005d9d867452a65be7e58233e/stats returned error: No such container: a6366dc2480e5516eae7a91c2696a11e4b390e7005d9d867452a65be7e58233e" 
time="2016-08-11T15:48:09.729956876Z" level=error msg="Handler for GET /v1.17/containers/a6366dc2480e5516eae7a91c2696a11e4b390e7005d9d867452a65be7e58233e/stats returned error: No such container: a6366dc2480e5516eae7a91c2696a11e4b390e7005d9d867452a65be7e58233e" 
time="2016-08-11T15:48:09.730276370Z" level=error msg="Handler for GET /v1.17/containers/a6366dc2480e5516eae7a91c2696a11e4b390e7005d9d867452a65be7e58233e/stats returned error: No such container: a6366dc2480e5516eae7a91c2696a11e4b390e7005d9d867452a65be7e58233e"

(near to 1Gb in a hour)

Also docker + agent processes consume near to 100% CPU.

We have 1.11.0 agent on one machine and latest 1.11.1 agent on another (in another cluster)

Only change that we have in agent settings - our ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION parameter was set to 15m

Looks like very critical issue (at least for our company).
How we can avoid this? or will it be fixed?

The text was updated successfully, but these errors were encountered:

richardpen · 2016-08-11T16:20:22Z

@GeyseR Thanks for reporting this issue, we're aware of this issue and working on fixing it. Will let you know when we have an update. As a temporary fix, you can restart the agent by sudo stop ecs and sudo start ecs to get rid of the error.

GeyseR · 2016-08-11T18:20:56Z

What about downgrading ecs agent to lower version? (by using old ECS Optimized AMIs)
Do you know from what version this bug is appeared?

richardpen · 2016-08-11T19:41:12Z

@GeyseR This is an issue existed in all agent version. Currently restart agent is the only solution of this issue. We are working on fixing it, will let you know when we have an update.

martinrehfeld · 2016-08-12T09:00:50Z

Might this be related? We are seeing this message in the agent log:

2016-08-12T08:53:05Z [WARN] Error retrieving stats for container e278c6d145aaf164dfeec6a030e86b37f3ff30ee67f9353300d23999a114a324: io: read/write on closed pipe

The named container does exist, though.

This would not be a problem in itself, but whenever this message logged, the agent keeps one additional socket connection open, until it eventually hits the maximum of 1024 FDs.

This started to happen with the latest amzn-ami-2016.03.f-amazon-ecs-optimized AMI (we had amzn-ami-2016.03.c-amazon-ecs-optimized running before and that did not show that problem).

Shall I rather open a new issue for this?

richardpen · 2016-08-12T18:52:19Z

@martinrehfeld Thanks for letting us know, this is caused by the same reason in the agent. So you don't need to reopen one for this issue. We are working on a fix, will let you know when we have an update.

GeyseR · 2016-08-15T11:26:54Z

Hi @richardpen!

Do you have any estimate date for solving this issue? This become most critical issue for our system, because almost every deployment lead to downtime.

lpetre · 2016-08-15T11:32:11Z

Is it advisable to just periodically restart ecs with a cron job until this is fixed?

GeyseR · 2016-08-15T11:43:53Z

Yes, this might be a temporary workaround, but yesterday i had problem with starting ecs agent container after stopping ecs service. Previous container just hung with Removal in progress status and it didn't allow to start ecs service again.
Anyway, it would be better to fix in service itself. Because as i understand this can touch many users and they might not know about such workarounds.

richardpen · 2016-08-15T18:03:35Z

@GeyseR @lpetre @martinrehfeld We have already had a pull request #482 for this. If you'd like to use a pre-build agent version in your instance before we release our AMI, please send me an email at: penyin (at) amazon.com.

Thanks,
Peng

samuelkarp · 2016-08-22T21:17:10Z

We've just released 1.12.1, which should fix this issue. Please let us know if you continue to run into problems.

jhovell · 2016-08-22T21:18:51Z

@samuelkarp is there an ECS AMI associated with this release or do you recommend customers manually update the agent on each host?

samuelkarp · 2016-08-22T21:20:50Z

The new ECS AMI is amzn-ami-2016.03.h-amazon-ecs-optimized. We'll be updating our documentation shortly.

jhovell · 2016-08-22T21:22:57Z

Thanks, I'll be watching http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html for AMI details.

samuelkarp · 2016-08-22T21:42:28Z

@jhovell The documentation has been updated with new AMI IDs.

ziggythehamster · 2016-08-22T21:59:15Z

@samuelkarp Don't forget the Marketplace page :)

samuelkarp · 2016-08-22T22:08:23Z

@ziggythehamster We're coordinating with the Marketplace team to get the listing updated there.

GeyseR · 2016-08-25T19:23:19Z

Hi, AWS team.

Unfortunately we had same problem on latest ECS-Optimized AMI with ecs agent v1.12.1 installed.

Looks like this issue need to be reopened.

richardpen · 2016-08-25T19:39:44Z

@GeyseR It's still possible to see few error messages in the logs, as the container stop event may be handled right after collecting metrics(every second). But the agent shouldn't keep fetching metrics for the same stopped container. Did you see the same container id in the error logs? If you see the same container id has the error, could you send me the agent logs here or penyin (at) amazon.com?

GeyseR · 2016-08-25T21:54:08Z

Hi @richardpen!

Unfortunately we have exact problem with high CPU load during long period of time and a lot of identical log messages (docker+ecs).

Not sure which logs do you want to receive.
We have 5.2 Gb of ecs logs total:

$ du -h /var/log/ecs/
5.2G    /var/log/ecs/

Here is count of lines with identical messages from one of log files:

$ cat /var/log/ecs/ecs-agent.log.2016-08-25-17 | grep 'Error retrieving stats for container 8ef5467a191faf688374f9f33135443274e28d0827152675203c0c938dd41d40' | wc -l
1833938
$ cat /var/log/ecs/ecs-agent.log.2016-08-25-17 | wc -l
1834011

Let me know if you need more info...

samuelkarp · 2016-08-25T22:27:59Z

@GeyseR Can you open a case with AWS Support? It sounds like there might be something going on that's specific to your setup and we'd like to dig in with you in a setting where we can discuss the specifics of your situation.

* Include amazon-ecs-volume-plugin and startup scripts in Debian Package (aws#450) * add amazon-ecs-volume-plugin to rpm generic package (aws#462) * Fix the issue that potentially curl target not present in bucket during release * Update copyrights Co-authored-by: Dennis Conrad <dennis.conrad@sainsburys.co.uk>

richardpen added the kind/bug label Aug 11, 2016

richardpen self-assigned this Aug 11, 2016

richardpen mentioned this issue Aug 15, 2016

Fix issue #478 #482

Closed

samuelkarp added this to the 1.12.1 milestone Aug 15, 2016

jbergknoff mentioned this issue Aug 17, 2016

Too many open files #488

Closed

richardpen added the pending release label Aug 17, 2016

samuelkarp closed this as completed Aug 22, 2016

aaithal mentioned this issue Aug 24, 2016

Agent creates giant logs, crashes #507

Closed

samuelkarp mentioned this issue Aug 27, 2016

WARN messages when no Tasks are scheduled #506

Closed

samuelkarp removed the pending release label Oct 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ECS agent keeps trying to fetch stats for non existent container #478

ECS agent keeps trying to fetch stats for non existent container #478

GeyseR commented Aug 11, 2016

richardpen commented Aug 11, 2016

GeyseR commented Aug 11, 2016

richardpen commented Aug 11, 2016

martinrehfeld commented Aug 12, 2016

richardpen commented Aug 12, 2016

GeyseR commented Aug 15, 2016

lpetre commented Aug 15, 2016 •

edited

Loading

GeyseR commented Aug 15, 2016

richardpen commented Aug 15, 2016

samuelkarp commented Aug 22, 2016

jhovell commented Aug 22, 2016

samuelkarp commented Aug 22, 2016

jhovell commented Aug 22, 2016

samuelkarp commented Aug 22, 2016

ziggythehamster commented Aug 22, 2016

samuelkarp commented Aug 22, 2016

GeyseR commented Aug 25, 2016

richardpen commented Aug 25, 2016

GeyseR commented Aug 25, 2016

samuelkarp commented Aug 25, 2016

ECS agent keeps trying to fetch stats for non existent container #478

ECS agent keeps trying to fetch stats for non existent container #478

Comments

GeyseR commented Aug 11, 2016

richardpen commented Aug 11, 2016

GeyseR commented Aug 11, 2016

richardpen commented Aug 11, 2016

martinrehfeld commented Aug 12, 2016

richardpen commented Aug 12, 2016

GeyseR commented Aug 15, 2016

lpetre commented Aug 15, 2016 • edited Loading

GeyseR commented Aug 15, 2016

richardpen commented Aug 15, 2016

samuelkarp commented Aug 22, 2016

jhovell commented Aug 22, 2016

samuelkarp commented Aug 22, 2016

jhovell commented Aug 22, 2016

samuelkarp commented Aug 22, 2016

ziggythehamster commented Aug 22, 2016

samuelkarp commented Aug 22, 2016

GeyseR commented Aug 25, 2016

richardpen commented Aug 25, 2016

GeyseR commented Aug 25, 2016

samuelkarp commented Aug 25, 2016

lpetre commented Aug 15, 2016 •

edited

Loading