[Tracking] AWS Cloudwatch metrics reporting issue with CGroupsV2 #585

karthikeyanvenkatraman · 2021-12-21T09:50:09Z

Description

We are using faltcar AWS AMI's for ECS cluster and since the release of flatcar stable version 2983.2.0, We are no longer able to get ECS Cloudwatch Metrics. Seems like ecs-agent is not compatible with Cgroups V2

Impact

On upgrading the instances to flatcar AMI ami-08165d837cc8ef7f6, AWS ECS Cluster and Service metrics such as CPU Utilization and Memory Utilization are unavailable.

Environment and steps to reproduce

Setup an EC2 instance using flatcar AMI(ami-08165d837cc8ef7f6) which has the stable release of 2983.2.0.
Use latest version of ecs-agent and join the instance to the ECS Cluster

Expected behavior

The ECS instances are supposed to send metrics such as CPU Utilization and Memory Utilization to cloudwatch. However we get the below messages from the ecs-agent.
msg="cloudwatch metrics for container 7d8386b039d1a0726d863a74412824d39a97f984258ed315a60d58cb90cc5fbf not collected, reason (cpu): need at least 2 data points in queue to calculate CW stats set" module=engine.go

Additional information

We could manually force the AMI to use the legacy cgroups using the doc - https://www.flatcar-linux.org/docs/latest/container-runtimes/switching-to-unified-cgroups/#starting-new-nodes-with-legacy-cgroups, However this requires a restart and looks to be a time consuming factor. We would like to know if there is any alternate fix available without restart and still be able to use the new features that comes up as part of the stable latest releases.

The text was updated successfully, but these errors were encountered:

tormath1 · 2021-12-21T13:48:17Z

Hi @karthikeyanvenkatraman ,

After quickly checked aws/amazon-ecs-agent, it seems the ecs-agent relies on legacy cgroups:

	github.com/containerd/cgroups v0.0.0-20170627184340-c3fc2b77b568

Could you open an issue on the repository to get a confirmation from AWS folks?

We would like to know if there is any alternate fix available without restart and still be able to use the new features that comes up as part of the stable latest releases.

Since cgroups is defined by init process (systemd), I'm not sure we can avoid to restart your nodes.

karthikeyanvenkatraman · 2021-12-21T14:02:55Z

Hi @karthikeyanvenkatraman ,

After quickly checked aws/amazon-ecs-agent, it seems the ecs-agent relies on legacy cgroups:
	github.com/containerd/cgroups v0.0.0-20170627184340-c3fc2b77b568
Could you open an issue on the repository to get a confirmation from AWS folks?

We would like to know if there is any alternate fix available without restart and still be able to use the new features that comes up as part of the stable latest releases.

Since cgroups is defined by init process (systemd), I'm not sure we can avoid to restart your nodes.

We reached out to AWS support on this issue and they had confirmed that ecs-agent dont support cgroups v2 at this time and they have an open issue. We dont have a ETA around this still.

tormath1 · 2021-12-23T08:50:37Z

@karthikeyanvenkatraman thanks again for your report, two documentations points have been added:

Let's keep this issue opened has a tracking one for future generations.

karthikeyanvenkatraman · 2022-04-08T02:00:50Z

I see this is resolved now with the release of ecs agent v1.61.0 this morning.

tormath1 · 2022-04-08T07:23:46Z

@karthikeyanvenkatraman this is great - it's a really good news. Did you get a chance to test it ?

karthikeyanvenkatraman · 2022-04-08T07:49:59Z

@tormath1 , yes. we did roll out the changes and did not notice any issues until now.

sayanchowdhury · 2023-09-08T12:16:52Z

I'm closing this issue as resolved. Free free to reopen if you need to discuss more.

karthikeyanvenkatraman added the kind/bug Something isn't working label Dec 21, 2021

tormath1 added the platform/AWS label Dec 21, 2021

tormath1 mentioned this issue Dec 21, 2021

docs/container/cgroup: add AWS ECS agent as known issue flatcar-archive/flatcar-docs#194

Merged

tormath1 added the kind/docs label Dec 21, 2021

tormath1 changed the title ~~AWS Cloudwatch metrics reporting issue with CGroupsV2~~ [Tracking] AWS Cloudwatch metrics reporting issue with CGroupsV2 Dec 22, 2021

tormath1 removed the kind/docs label Dec 22, 2021

jepio added the area/cgroup2 Issues uncovered through the migration to cgroup2. label Jan 6, 2022

This was referenced Jan 14, 2022

Support unified cgroups (cgroups v2) aws/containers-roadmap#1535

Closed

Support Linux cgroup v2 aws/amazon-ecs-agent#3117

Closed

sayanchowdhury closed this as completed Sep 8, 2023

tormath1 mentioned this issue Sep 8, 2023

aws cgroupsv2 remove known issues flatcar-archive/flatcar-docs#333

Merged

github-actions bot mentioned this issue Nov 9, 2023

Monthly contributions report 2023-08-22 - 2023-09-21 #1245

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking] AWS Cloudwatch metrics reporting issue with CGroupsV2 #585

[Tracking] AWS Cloudwatch metrics reporting issue with CGroupsV2 #585

karthikeyanvenkatraman commented Dec 21, 2021

tormath1 commented Dec 21, 2021

karthikeyanvenkatraman commented Dec 21, 2021

tormath1 commented Dec 23, 2021

karthikeyanvenkatraman commented Apr 8, 2022

tormath1 commented Apr 8, 2022

karthikeyanvenkatraman commented Apr 8, 2022

sayanchowdhury commented Sep 8, 2023

[Tracking] AWS Cloudwatch metrics reporting issue with CGroupsV2 #585

[Tracking] AWS Cloudwatch metrics reporting issue with CGroupsV2 #585

Comments

karthikeyanvenkatraman commented Dec 21, 2021

Description

Impact

Environment and steps to reproduce

Expected behavior

Additional information

tormath1 commented Dec 21, 2021

karthikeyanvenkatraman commented Dec 21, 2021

tormath1 commented Dec 23, 2021

karthikeyanvenkatraman commented Apr 8, 2022

tormath1 commented Apr 8, 2022

karthikeyanvenkatraman commented Apr 8, 2022

sayanchowdhury commented Sep 8, 2023