Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking] AWS Cloudwatch metrics reporting issue with CGroupsV2 #585

Closed
karthikeyanvenkatraman opened this issue Dec 21, 2021 · 7 comments
Labels
area/cgroup2 Issues uncovered through the migration to cgroup2. kind/bug Something isn't working platform/AWS

Comments

@karthikeyanvenkatraman
Copy link

Description

We are using faltcar AWS AMI's for ECS cluster and since the release of flatcar stable version 2983.2.0, We are no longer able to get ECS Cloudwatch Metrics. Seems like ecs-agent is not compatible with Cgroups V2

Impact

On upgrading the instances to flatcar AMI ami-08165d837cc8ef7f6, AWS ECS Cluster and Service metrics such as CPU Utilization and Memory Utilization are unavailable.

Environment and steps to reproduce

  1. Setup an EC2 instance using flatcar AMI(ami-08165d837cc8ef7f6) which has the stable release of 2983.2.0.
  2. Use latest version of ecs-agent and join the instance to the ECS Cluster

Expected behavior

The ECS instances are supposed to send metrics such as CPU Utilization and Memory Utilization to cloudwatch. However we get the below messages from the ecs-agent.
msg="cloudwatch metrics for container 7d8386b039d1a0726d863a74412824d39a97f984258ed315a60d58cb90cc5fbf not collected, reason (cpu): need at least 2 data points in queue to calculate CW stats set" module=engine.go

Additional information

We could manually force the AMI to use the legacy cgroups using the doc - https://www.flatcar-linux.org/docs/latest/container-runtimes/switching-to-unified-cgroups/#starting-new-nodes-with-legacy-cgroups, However this requires a restart and looks to be a time consuming factor. We would like to know if there is any alternate fix available without restart and still be able to use the new features that comes up as part of the stable latest releases.

@karthikeyanvenkatraman karthikeyanvenkatraman added the kind/bug Something isn't working label Dec 21, 2021
@tormath1
Copy link
Contributor

Hi @karthikeyanvenkatraman ,

After quickly checked aws/amazon-ecs-agent, it seems the ecs-agent relies on legacy cgroups:

	github.com/containerd/cgroups v0.0.0-20170627184340-c3fc2b77b568

Could you open an issue on the repository to get a confirmation from AWS folks?

We would like to know if there is any alternate fix available without restart and still be able to use the new features that comes up as part of the stable latest releases.

Since cgroups is defined by init process (systemd), I'm not sure we can avoid to restart your nodes.

@karthikeyanvenkatraman
Copy link
Author

Hi @karthikeyanvenkatraman ,

After quickly checked aws/amazon-ecs-agent, it seems the ecs-agent relies on legacy cgroups:

	github.com/containerd/cgroups v0.0.0-20170627184340-c3fc2b77b568

Could you open an issue on the repository to get a confirmation from AWS folks?

We would like to know if there is any alternate fix available without restart and still be able to use the new features that comes up as part of the stable latest releases.

Since cgroups is defined by init process (systemd), I'm not sure we can avoid to restart your nodes.

We reached out to AWS support on this issue and they had confirmed that ecs-agent dont support cgroups v2 at this time and they have an open issue. We dont have a ETA around this still.

@tormath1 tormath1 changed the title AWS Cloudwatch metrics reporting issue with CGroupsV2 [Tracking] AWS Cloudwatch metrics reporting issue with CGroupsV2 Dec 22, 2021
@tormath1
Copy link
Contributor

@karthikeyanvenkatraman thanks again for your report, two documentations points have been added:

Let's keep this issue opened has a tracking one for future generations.

@jepio jepio added the area/cgroup2 Issues uncovered through the migration to cgroup2. label Jan 6, 2022
@karthikeyanvenkatraman
Copy link
Author

I see this is resolved now with the release of ecs agent v1.61.0 this morning.

@tormath1
Copy link
Contributor

tormath1 commented Apr 8, 2022

@karthikeyanvenkatraman this is great - it's a really good news. Did you get a chance to test it ?

@karthikeyanvenkatraman
Copy link
Author

@tormath1 , yes. we did roll out the changes and did not notice any issues until now.

@sayanchowdhury
Copy link
Member

I'm closing this issue as resolved. Free free to reopen if you need to discuss more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cgroup2 Issues uncovered through the migration to cgroup2. kind/bug Something isn't working platform/AWS
Projects
None yet
Development

No branches or pull requests

4 participants