-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MemoryUtilization includes kernel caches #3594
Comments
Hi, thank you for reporting this! I am looking into it now. |
As you mentioned, ECS Agent currently makes use of docker stats in its calculation of the value it sends to Cloudwatch to report as MemoryUtilization and docker stats reports an inflated memory usage (in bytes) value. Since this enhancement is already being tracked as an issue in the docker cli repo, I will close this issue in favor of that one to avoid duplicated and potentially divergent efforts. Please feel free to reach out should you have any additional concerns or information. |
I've found that setting up |
@luhn could you share what exactly you mean by "setting up /tmp as a bind mount"? The default with an Ubuntu image seems to treat df /tmp/
Filesystem 1K-blocks Used Available Use% Mounted on
overlay 30787492 11544624 17653620 40% / |
I added a volume to the task and set a mountpoint with |
Our workload makes heavy use of tempfiles, like @luhn. Happily the But I wonder if AWS should research better default settings to vm.vfs_cache_pressure, or at least make it configurable for Fargate tasks. // CDK config for a Fargate task
task.addVolume({ name: 'tmp' });
task.defaultContainer!.addMountPoints({ sourceVolume: 'tmp', containerPath: '/tmp', readOnly: false }); # Dockerfile changes to fix the 0755 permissions
RUN mkdir -p /tmp && chmod 1777 /tmp
VOLUME ["/tmp"] |
Summary
Cloudwatch is reporting what looks to be a memory leak in my ECS task. MemoryUtilization has been rising continually since the last deployment and currently sits at 330% with no sign of stopping.
ContainerInsight corroborates this, reporting that my
app
container is using 990MB.However, memory usage on the entire host is only 441MB and has been stable. So the number ECS is reporting cannot be accurate.
What's happening is that MemoryUtilization is including kernel slabs, notably dentry. Every time a file is created, information is saved in the dentry cache, but is not cleared when the file is deleted. So applications like mine that create many short-lived files, dentry can inflate to a massive size.
This unfortunately makes MemoryUtilization meaningless and leaves me with no insight into the memory usage of my containers.
Description
As mentioned above, ContainerInsights reports 990MB.
Docker stats also reports this this. (This shows 1108MB because it was run a few hours later.)
However, host memory use is only 440MB.
If we look into the containers
memory.stat
, we can see RSS is 158m (about what I would expected) withcache
andinactive_files
and others showing modest amounts that would not account for the discrepancy.memory.usage_in_bytes
shows a very large value. I believe ECS takesusage_in_bytes - cache
, so that's where our inflated value is coming from.If we look at kmem use, we can see that it's extremely high, which I believe accounts for the discrepancy.
And if we break that down we can see that
dentry
is absolutely massive.And finally, if we clear the caches (
echo 3 | sudo tee /proc/sys/vm/drop_caches
) , memory usage drops from several hundred percent to about 70%, proving that it is indeed a kernel cache that is inflating MemoryUtilization.Environment Details
t3.small running Amazon Linux 2 (amzn2-ami-ecs-hvm-2.0.20230214-x86_64-ebs
ami-0ae546d2dd33d2039
), ECS Agent 1.68.2(This was initially observed on Fargate but I switched to EC2 to facilitate debugging.)
docker info output:
Prior art
memory.stat.cache
inode
,dentry
and other slabs fromMEM USAGE
docker/cli#3171The text was updated successfully, but these errors were encountered: