-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rfc] how best to reduce runc's effects on cgroup memory limits #4021
Comments
Originally posted by @cyphar in #3987 (comment) |
Thanks! Also /cc @neersighted who was in one of those conversations at the time. |
The following is a more "radical" approach conceptually, lot of semantic changes. But something to bear in mind. I was at All Systems Go last week, systemd is trying an approach where it serializes the info the child will need, and fork into a small child, and the child recovers the serialized info, does that and exec. This seems to be promising for systemd as a way to reduce the memory usage, as the child can be way smaller. I think they expect to merge that PR this week, it is called systemd-executor or something like that (haven't searched for the PR, though). |
@rata That is very similar to how runc currently works. We fork a child ( That being said, I do wonder where the 5-7mb of memory usage is coming from. We do a lot of work in EDIT: Some local tests on my machine show that the actual peak memory usage by runc before starting the container is actually about 2.2-3mb. It seems possible that something about our CI setups leads to more memory usage (up to 7mb on our CentOS CI machines). While 2-3mb is worse than 1mb or 0mb, I don't think it's that unreasonable. These tests were done with #3987 applied. I can try to go back to a pre-memfd version of runc if needed but |
@cyphar no, it is very different because, while runc forks several times, it never execs into a very small binary to apply the cgroup limits. It just apply them while running the runc binary. Therefore, the runc binary of ~13MB imposes a limit for the cgroup mem limit. |
The runc binary size doesn't affect the cgroup limit. I can run a 2.5mb limit container (with I think this is because we never re-exec runc after joining the cgroup so the memory accounting for the binary is attributed to the host. The memory usage also isn't from binary cloning, I think that has been pretty well-established at this point. My pet theory was that runc exits so quickly that the GC doesn't have a chance to run (if you try to run with |
Originally posted by @thaJeztah in #3987 (comment)
The text was updated successfully, but these errors were encountered: