Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS Cluster metric reporting does not work with Docker 1.11 #387

Closed
ryanwalls opened this issue Apr 25, 2016 · 16 comments
Closed

ECS Cluster metric reporting does not work with Docker 1.11 #387

ryanwalls opened this issue Apr 25, 2016 · 16 comments
Labels

Comments

@ryanwalls
Copy link

ryanwalls commented Apr 25, 2016

As soon as we updated to Docker 1.11, our ECS cluster stopped reporting metrics.

screenshot 2016-04-25 11 05 12

@samuelkarp
Copy link
Contributor

@ryanwalls Can you provide the following information?

  • ECS Agent version (output of curl localhost:51678/v1/metadata)
  • Docker version (output of sudo docker version)
  • Docker debugging info (output of sudo docker info)
  • EC2 AMI ID
  • Relevant log files (usually in /var/log/ecs and /var/log/docker)

@ryanwalls
Copy link
Author

ryanwalls commented Apr 25, 2016

@samuelkarp

Docker version is probably where we need to stop looking. We use the https://get.docker.com script to install our docker engine... which grabs the latest version all the time. They released 1.11 on 4/12, https://github.com/docker/docker/wiki/Engine-1.11.0. So I would say that's more than a coincidence.

Sorry for bad description... we did update our architecture. I'll update the title.

@ryanwalls ryanwalls changed the title ECS Cluster has been reporting 0% for all metrics since 4/12/16 ECS Cluster metric reporting does not work with Docker 1.11 Apr 25, 2016
@ryanwalls
Copy link
Author

Agent version: 1.8.1
Docker version: 1.11

I can provide the other log files if wanted. I imagine the Docker 1.11 thing is a known issue.

@samuelkarp
Copy link
Contributor

Docker 1.11.0 breaks our CloudWatch metrics feature as the runc and containerd integration changed some of the things we use to power that feature. I'll mark this as a bug so we can track publicly; we've already been looking at paths forward here.

@ryanwalls
Copy link
Author

@samuelkarp Great. Thanks for the quick response.

@samuelkarp
Copy link
Contributor

@ryanwalls There is a workaround that appears to work for me, but it's still not ideal. You'll want to verify on your system that these are the right settings.

In Docker versions prior to 1.11, Docker would maintain state information about containers in a directory like /var/run/docker/execdriver/native, which we'd mount into the agent container as /var/lib/docker/execdriver/native. Starting with 1.11, the directory used by Docker changed. On Ubuntu 14.04 with the Docker project's build of Docker 1.11, the directory looks to now be located at /run/runc. I was able to get the metrics working by changing the mount to /run/runc:/var/lib/docker/execdriver/native:ro. Please let us know if this works for you.

@ryanwalls
Copy link
Author

@samuelkarp Thanks for the workaround! It worked for us on our Ubuntu instances. Metrics are coming in again.

@stannnous
Copy link

We have the same problem since upgrading from Docker 1.9 on Ubuntu 14.04 to Docker 1.11 on Ubuntu 16.04 but the workaround doesn't work for us.

        "Mounts": [
            {
                "Source": "/var/log/ecs",
                "Destination": "/log",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Source": "/var/lib/ecs/data",
                "Destination": "/data",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Source": "/sys/fs/cgroup",
                "Destination": "/sys/fs/cgroup",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Source": "/run/runc",
                "Destination": "/var/lib/docker/execdriver/native",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Source": "/var/run/docker.sock",
                "Destination": "/var/run/docker.sock",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],

The ECS log (attached) contains a lot of:

2016-05-20T13:31:44Z [WARN] Error getting cpu stats module="stats" err="No data in the queue" container="&{DockerID:c3738c63b52a4a6859708558fcdb0c2b3a29310302202a524b5767d7ea088e62}"
2016-05-20T13:31:44Z [WARN] Error getting instance metrics module="tcs client" err="No task metrics to report"

and

2016-05-20T13:32:24Z [ERROR] Error getting message from ws backend module="ws client" err="websocket: close 1002 Channel long idle: No message is received, close the channel"
ecs-agent-log.txt

@rodlogic
Copy link

Same here. The workaround doesn't work for us either.

@djenriquez
Copy link

djenriquez commented May 25, 2016

Confirmed the following mounts fixed the issue for ECS-Agent 1.9.0 running on Docker 1.11.1:

        "Mounts": [
            {
                "Source": "/var/lib/ecs/data",
                "Destination": "/data",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Source": "/sys/fs/cgroup",
                "Destination": "/sys/fs/cgroup",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Source": "/run/runc",
                "Destination": "/var/lib/docker/execdriver/native",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Source": "/var/run/docker.sock",
                "Destination": "/var/run/docker.sock",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Source": "/var/log/ecs",
                "Destination": "/log",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ]

@stannnous
Copy link

@djenriquez That's the same mounts we have also using ECS-Agent 1.9.0 on Docker 1.11.1 but it doesn't work for us. Could the host OS be playing a role here? We're running Ubuntu 16.04. What did you test on?

@asmarques
Copy link

Metric reporting is also not working on Ubuntu 16.04 with Docker 1.10.3 and ECS Agent 1.9.0 despite /var/lib/docker/execdriver/native being correctly mounted in the container.

The following message shows up in the logs:

2016-05-26T09:28:59Z [DEBUG] Invalid container statitistics reported, got number of cores = 0
2016-05-26T09:28:59Z [DEBUG] Error getting stats module="stats" error="Invalid container statistics reported" contianer="&{containerMetadata:0xc20839dc80 ctx:0xc20836e9c0 cancel:0x4ff240 statePath:/var/lib/docker/execdriver/native/a5fc48167915f0367fc3cc52cec23f0de090711aae45c46054550b9b36d32b5b statsQueue:0xc20836e980 statsCollector:0xcd9be8}"
2016-05-26T09:29:04Z [WARN] Error getting cpu stats module="stats" err="No data in the queue" container="&{DockerID:a5fc48167915f0367fc3cc52cec23f0de090711aae45c46054550b9b36d32b5b}"
2016-05-26T09:29:04Z [WARN] Error getting cpu stats module="stats" err="No data in the queue" container="&{DockerID:4ff9719a31b0be20669fa26f97c91702c0c8ff3d899f1014659453d62829cd5a}"
2016-05-26T09:29:04Z [WARN] Error getting instance metrics module="tcs client" err="No task metrics to report"

Looking at the code it seems this could be fixed by #400.

@aaithal
Copy link
Contributor

aaithal commented May 26, 2016

@asmarques As you correctly pointed out, #400 should fix the issue as it removes the need for mounting any of the cgroup or execdriver volumes to the ECS Agent (In fact, we have a PR to not mount these volumes when running the ECS Agent with ecs-init).

@richardpen
Copy link

This should be fixed in ECS Agent v1.10.0 release. I'm closing this issue for now, please let us know if you have any issues.

@vladgh
Copy link

vladgh commented Jun 1, 2016

@aaithal Just to make sure got this right, the following lines are not needed in the run command for v1.10.0 anymore?

-v /sys/fs/cgroup:/sys/fs/cgroup:ro",
-v /run/runc:/var/lib/docker/execdriver/native:ro",

@samuelkarp
Copy link
Contributor

@vladgh Yes, that's correct.

edibble21 pushed a commit to edibble21/amazon-ecs-agent that referenced this issue Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants