WARN messages when no Tasks are scheduled #506

miketheman · 2016-08-24T12:58:18Z

During routine cluster management, we tend to bring up extra capacity in our cluster to be ready to accept new scheduled tasks on these instances.

We routinely see a behavior where the instance isn't running any scheduled tasks, yet emits WARN logs that look like this:

 [WARN] Error getting instance metrics: No task metrics to report

Now, I realize this may have something to do with the detection of other containers running on the instance. We run a per-container-instance Agent for Task containers to communicate with via host networking, similar to the approach described in the AWS Blog post.

Is the ECS Agent detecting the other running container, making the instance not idle and then failing to collect task related metrics, since there are no ECS-managed tasks?

ref links:

amazon-ecs-agent/agent/stats/engine.go

Lines 229 to 231 in d5e8c51

    
           func (engine *DockerStatsEngine) isIdle() bool { 
        
           	return len(engine.tasksToContainers) == 0 
        
           }

amazon-ecs-agent/agent/stats/engine.go

Lines 183 to 188 in d5e8c51

    
           if idle { 
        
           	seelog.Debug("Instance is idle. No task metrics to report") 
        
           	fin := true 
        
           	metricsMetadata.Fin = &fin 
        
           	return metricsMetadata, taskMetrics, nil 
        
           }

The text was updated successfully, but these errors were encountered:

samuelkarp · 2016-08-24T18:49:23Z

@miketheman We just released 1.12.1 which should have addressed a number of problems related to this. Were you seeing this with 1.12.1 or with a previous version?

miketheman · 2016-08-24T19:09:18Z

Hi @samuelkarp ! Indeed, this was observed during bringing up new instances with the latest Agent version.

samuelkarp · 2016-08-24T20:15:29Z

@miketheman Thanks for confirming. Can you share the logs you're seeing? I tried (trivially) to reproduce this with a new instance running our 2016.03.h AMI and I'm not seeing any of those WARNs. If you're not comfortable sharing publicly, can you send them to me at skarp (at) amazon.com?

miketheman · 2016-08-27T00:23:54Z

@samuelkarp Here's an example:

2016-08-27T00:06:14Z [INFO] Creating poll dialer, host: ecs-t-1.us-east-1.amazonaws.com
2016-08-27T00:06:14Z [WARN] Error getting cpu stats, err: No data in the queue, container: &{64fe0843f0cd1c187b56cfe4fcc4a77d87eaede54396c6494702ff9370e84c30}
2016-08-27T00:06:14Z [WARN] Error getting instance metrics: No task metrics to report
2016-08-27T00:11:55Z [INFO] Creating poll dialer, host: ecs-a-1.us-east-1.amazonaws.com

Instance details:

[ec2-user@ip-something ~]$ uname -a
Linux ip-10-240-110-153 4.4.16-27.56.amzn1.x86_64 #1 SMP Fri Aug 12 23:25:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
[ec2-user@ip-something ~]$ cat /etc/issue
Amazon Linux AMI release 2016.03
Kernel \r on an \m
[ec2-user@ip-something ~]$ curl http://169.254.169.254/latest/meta-data/ami-id ; echo \n
ami-6bb2d67cn
[ec2-user@ip-something ~]$ rpm -qi ecs-init | grep Version
Version     : 1.12.1

samuelkarp · 2016-08-27T02:50:23Z

@miketheman This looks like maybe we didn't fully fix #478. Can you provide the following information? Some of this might be more sensitive, so you can either send it to me by email at skarp (at) amazon.com or open a case with AWS Support. If there is anything like credentials or auth tokens in the logs (from environment variables or command/entrypoint), please redact them. All of this should come from an instance that is currently affected:

Full (unfiltered, not truncated) logs for the agent (it would be awesome if they're at debug level)
The agent state file (located at /var/lib/ecs/data/ecs_agent_data.json)
The output of docker ps
The agent log file after running docker kill -s USR1 ecs-agent (this will emit a stack trace into the logs)

I'd like to correlate the container that shows up in the logs (64fe0843f0cd1c187b56cfe4fcc4a77d87eaede54396c6494702ff9370e84c30 in the log above) with the rest of what is happening (like the task it belongs to) and see if we can find what caused the agent to not get data (maybe the container is no longer running?).

miketheman · 2016-08-27T11:12:58Z

@samuelkarp I have sent the logs to your amazon email address.

samuelkarp · 2016-08-29T20:53:17Z

@miketheman Thank you for sending all that information! I think I've narrowed this down to occurring when the agent is disconnected from and reconnects to the websocket it uses for reporting metrics. On a reconnect, it appears that the very first time it attempts to send metric data it emits this warning. I've now been able to reproduce this behavior myself, so we should be able to take it from here. Thank you for reporting this issue!

abramche · 2016-12-08T16:32:53Z

Still reproduces on 1.13.1

samuelkarp · 2016-12-08T21:38:23Z

@EugeneAbramchuk Thanks. We haven't fixed this yet since the only problem here is just a spurious WARN message. We'll keep this issue updated as it gets fixed, or if you're looking for something to contribute this would be something we'd accept.

To restate what we think is going on a bit more clearly:

The ECS agent opens a channel with docker stats
The ECS agent opens a connection to TCS (the backend component of ECS that receives metric data)
The ECS agent tries to publish metrics to TCS
The ECS agent did not receive any stats from Docker yet, so queue is empty
Publish aborts with the above error message

There is no impact other than it being an annoying WARN message. However, there is a case where docker stats is completely broken and the stats queue is always empty. In this case we want to raise the alarm, but for this we need to define SLA on docker-stats. We don't have this yet so we cannot tell if docker stats is working as we expect or not.

spy-tech · 2016-12-19T10:22:21Z

@samuelkarp What's the ETA on this fix?

samuelkarp added the more info needed label Aug 24, 2016

samuelkarp added kind/bug and removed more info needed labels Aug 29, 2016

liwenwu-amazon added a commit to liwenwu-amazon/amazon-ecs-agent that referenced this issue Dec 29, 2016

Fixed aws#506

03fb48c

liwenwu-amazon mentioned this issue Dec 29, 2016

Fixed https://github.com/aws/amazon-ecs-agent/issues/506 #652

Merged

8 tasks

liwenwu-amazon added a commit to liwenwu-amazon/amazon-ecs-agent that referenced this issue Dec 29, 2016

Fixed aws#506

eb96940

liwenwu-amazon added a commit to liwenwu-amazon/amazon-ecs-agent that referenced this issue Jan 3, 2017

Fixed aws#506

0dd2e89

liwenwu-amazon added a commit to liwenwu-amazon/amazon-ecs-agent that referenced this issue Jan 4, 2017

Fixed aws#506

178a672

liwenwu-amazon added a commit to liwenwu-amazon/amazon-ecs-agent that referenced this issue Jan 5, 2017

Fixed aws#506

eb31f63

liwenwu-amazon added a commit to liwenwu-amazon/amazon-ecs-agent that referenced this issue Jan 5, 2017

Fixed aws#506

1508e15

liwenwu-amazon added a commit to liwenwu-amazon/amazon-ecs-agent that referenced this issue Jan 11, 2017

Fixed aws#506

b89d69d

samuelkarp added this to the 1.14.1 milestone Feb 1, 2017

samuelkarp added the pending release label Feb 1, 2017

jhaynes closed this as completed in 1be5420 Mar 21, 2017

cabbruzzese removed the pending release label Apr 21, 2017

jwerak pushed a commit to appuri/amazon-ecs-agent that referenced this issue Jun 8, 2017

Fixed aws#506

f57affa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WARN messages when no Tasks are scheduled #506

WARN messages when no Tasks are scheduled #506

miketheman commented Aug 24, 2016

samuelkarp commented Aug 24, 2016

miketheman commented Aug 24, 2016

samuelkarp commented Aug 24, 2016

miketheman commented Aug 27, 2016

samuelkarp commented Aug 27, 2016

miketheman commented Aug 27, 2016

samuelkarp commented Aug 29, 2016

abramche commented Dec 8, 2016

samuelkarp commented Dec 8, 2016

spy-tech commented Dec 19, 2016

WARN messages when no Tasks are scheduled #506

WARN messages when no Tasks are scheduled #506

Comments

miketheman commented Aug 24, 2016

samuelkarp commented Aug 24, 2016

miketheman commented Aug 24, 2016

samuelkarp commented Aug 24, 2016

miketheman commented Aug 27, 2016

samuelkarp commented Aug 27, 2016

miketheman commented Aug 27, 2016

samuelkarp commented Aug 29, 2016

abramche commented Dec 8, 2016

samuelkarp commented Dec 8, 2016

spy-tech commented Dec 19, 2016