Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support unified cgroups (cgroups v2) #1535

Closed
dmazhar-cogniance opened this issue Oct 6, 2021 · 4 comments
Closed

Support unified cgroups (cgroups v2) #1535

dmazhar-cogniance opened this issue Oct 6, 2021 · 4 comments

Comments

@dmazhar-cogniance
Copy link

Summary

It looks like cgroup v2 is not supported by ecs-agent. Getting this error in logs:

cloudwatch metrics for container XXX not collected, reason (cpu): need at least 2 data points in queue to calculate CW stats set" module=engine.go

Description

Hi team! We're using ecs-agent with flatcar OS, and recent beta migrated to the cgroups v2 which broke the cloudwatch metrics for our ECS clusters. Could you please suggest if cgroup v2 is supported by ecs-agent, or are there any plans to support it? For now we're locked out OS version, but need to plan what to do next.

@ubhattacharjya ubhattacharjya transferred this issue from aws/amazon-ecs-agent Oct 18, 2021
@ubhattacharjya
Copy link

#1118

@Phylu
Copy link

Phylu commented Jan 14, 2022

I have the issue as follows:

  • The host system is flatcar stable which I install as an AMI from the AWS marketplace.
  • I upgraded flatcar to a version newer than 2983.2.0 [1] which enables Groups V2 by default after it has been added to the Linux kernel in 2014 and is adopted by Linux distributions since 2019 [2].
  • The ECS agent does not support CGroups 2 and therefore the metrics collection fails [3].
  • I see memory leaks where the ECS agent uses multiple GB of memory on the host system so that the system degrades and I loose service connection sporadically. This is even worse than the issue with the missing metrics.

The problem with locking the flatcar to a working version that predates October 2021 is, that it contains a multitude of security vulnerabilities that have ben resolved in the latest stable version.

I really hope, that this can get some priority accordingly.

[1] https://www.flatcar-linux.org/releases/#release-2983.2.0
[2] https://medium.com/nttlabs/cgroup-v2-596d035be4d7
[3] flatcar/Flatcar#585

@sparrc
Copy link

sparrc commented Feb 14, 2022

Hello, we are currently working on supporting unified cgroups as part of supporting Amazon Linux 2022 (which uses cgroups v2 by default). First PR is here for supporting task resource limits using the new cgroups structure, and next we will be working on the metrics error that you've pasted above.

sparrc added a commit to sparrc/amazon-ecs-agent that referenced this issue Feb 15, 2022
closes aws/containers-roadmap#1535
closes aws#3117

This adds support for task-level resource limits when running on unified
cgroups (aka cgroups v2) with the systemd cgroup driver.

Cgroups v2 has introduced a cgroups format that is not backward compatible
with cgroups v1. In order to support both v1 and v2, we have added a config
variable to detect which cgroup version the ecs agent is running with.
The containerd/cgroups library is used to determine which mode it is using
on agent startup.

Cgroups v2 no longer can provide per-cpu usage stats, so this validation
was removed since we never used it either.
sparrc added a commit to sparrc/amazon-ecs-agent that referenced this issue Feb 15, 2022
closes aws/containers-roadmap#1535
closes aws#3117

This adds support for task-level resource limits when running on unified
cgroups (aka cgroups v2) with the systemd cgroup driver.

Cgroups v2 has introduced a cgroups format that is not backward compatible
with cgroups v1. In order to support both v1 and v2, we have added a config
variable to detect which cgroup version the ecs agent is running with.
The containerd/cgroups library is used to determine which mode it is using
on agent startup.

Cgroups v2 no longer can provide per-cpu usage stats, so this validation
was removed since we never used it either.
sparrc added a commit to sparrc/amazon-ecs-agent that referenced this issue Feb 15, 2022
closes aws/containers-roadmap#1535
closes aws#3117

This adds support for task-level resource limits when running on unified
cgroups (aka cgroups v2) with the systemd cgroup driver.

Cgroups v2 has introduced a cgroups format that is not backward compatible
with cgroups v1. In order to support both v1 and v2, we have added a config
variable to detect which cgroup version the ecs agent is running with.
The containerd/cgroups library is used to determine which mode it is using
on agent startup.

Cgroups v2 no longer can provide per-cpu usage stats, so this validation
was removed since we never used it either.
sparrc added a commit to sparrc/amazon-ecs-agent that referenced this issue Feb 15, 2022
closes aws/containers-roadmap#1535
closes aws#3117

This adds support for task-level resource limits when running on unified
cgroups (aka cgroups v2) with the systemd cgroup driver.

Cgroups v2 has introduced a cgroups format that is not backward compatible
with cgroups v1. In order to support both v1 and v2, we have added a config
variable to detect which cgroup version the ecs agent is running with.
The containerd/cgroups library is used to determine which mode it is using
on agent startup.

Cgroups v2 no longer can provide per-cpu usage stats, so this validation
was removed since we never used it either.
sparrc added a commit to sparrc/amazon-ecs-agent that referenced this issue Feb 16, 2022
closes aws/containers-roadmap#1535
closes aws#3117

This adds support for task-level resource limits when running on unified
cgroups (aka cgroups v2) with the systemd cgroup driver.

Cgroups v2 has introduced a cgroups format that is not backward compatible
with cgroups v1. In order to support both v1 and v2, we have added a config
variable to detect which cgroup version the ecs agent is running with.
The containerd/cgroups library is used to determine which mode it is using
on agent startup.

Cgroups v2 no longer can provide per-cpu usage stats, so this validation
was removed since we never used it either.
sparrc added a commit to sparrc/amazon-ecs-agent that referenced this issue Feb 16, 2022
closes aws/containers-roadmap#1535
closes aws#3117

This adds support for task-level resource limits when running on unified
cgroups (aka cgroups v2) with the systemd cgroup driver.

Cgroups v2 has introduced a cgroups format that is not backward compatible
with cgroups v1. In order to support both v1 and v2, we have added a config
variable to detect which cgroup version the ecs agent is running with.
The containerd/cgroups library is used to determine which mode it is using
on agent startup.

Cgroups v2 no longer can provide per-cpu usage stats, so this validation
was removed since we never used it either.
sparrc added a commit to aws/amazon-ecs-agent that referenced this issue Mar 2, 2022
* Support Unified Cgroups (cgroups v2)

closes aws/containers-roadmap#1535
closes #3117

This adds support for task-level resource limits when running on unified
cgroups (aka cgroups v2) with the systemd cgroup driver.

Cgroups v2 has introduced a cgroups format that is not backward compatible
with cgroups v1. In order to support both v1 and v2, we have added a config
variable to detect which cgroup version the ecs agent is running with.
The containerd/cgroups library is used to determine which mode it is using
on agent startup.

Cgroups v2 no longer can provide per-cpu usage stats, so this validation
was removed since we never used it either.

* wip

* update cgroups library with nil panic bugfix

* Initialize and toggle cgroup controllers
sparrc added a commit to sparrc/amazon-ecs-agent that referenced this issue Mar 30, 2022
* Support Unified Cgroups (cgroups v2)

closes aws/containers-roadmap#1535
closes aws#3117

This adds support for task-level resource limits when running on unified
cgroups (aka cgroups v2) with the systemd cgroup driver.

Cgroups v2 has introduced a cgroups format that is not backward compatible
with cgroups v1. In order to support both v1 and v2, we have added a config
variable to detect which cgroup version the ecs agent is running with.
The containerd/cgroups library is used to determine which mode it is using
on agent startup.

Cgroups v2 no longer can provide per-cpu usage stats, so this validation
was removed since we never used it either.

* wip

* update cgroups library with nil panic bugfix

* Initialize and toggle cgroup controllers
@karthikeyanvenkatraman
Copy link

It looks like we have sorted this with the release of ecs-agent v1.61.0. Thank you @sparrc and the ecs team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants