Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exec-driver: nomad + health-check process are part of memory limit cgroup #4491

Closed
fho opened this issue Jul 10, 2018 · 3 comments
Closed

exec-driver: nomad + health-check process are part of memory limit cgroup #4491

fho opened this issue Jul 10, 2018 · 3 comments

Comments

@fho
Copy link
Contributor

fho commented Jul 10, 2018

The following was observed with Nomad v0.8.3.

From the description in the docs I would expect that the memory parameter of the resources stanza should be set to the maximum amount of memory that my application uses at most.

But when an application is run with the exec-driver, the memory limit of the cgroup does not only account for the application process but also for the nomad executor and health-check process.

This is not intuitive. It's undocumented, error-prone and causes that application get's OOM killed despite the application process uses less memory then the specified memory limit.

Setting the memory limit correctly is hard in this scenario:

  • the memory required by the nomad executor process is unknown, it's not documented, depends on the architecture and can change between nomad version,
  • the memory required for the nomad internal health-checks is unknown, if external health-check application it's used it's not clear that their memory consumption are part of the memory limit.

To make the memory limits more straightforward, the memory limit should only apply to the application process itself.

@preetapan
Copy link
Contributor

@fho The executor is responsible for managing the lifecycle of the application, so its a desired feature to have it be in the same cgroup. As implemented, any tracking inside the executor (for healthcheck and stdout/stderr management) should not be a memory resource hog, it has a pretty low footprint.

I am going to close this issue out because the cgroup behavior is working as desired. If you are seeing a situation where you can show that your application used lesser memory than it specified and got OOM killed because of the executor, please open another issue with the details including job file, top or htop output showing resources used)

@schmichael
Copy link
Member

Just to add a little more detail: We plan on improving the script check documentation to clarify that it runs inside the container. That's definitely something we need to mention! Allowing script checks to run outside the task's container and resource limits would be a major security and isolation issue.

We also have longer term plans to allow setting memory requirements easier such as soft limiting. Unfortunately we don't have a timeline for those yet.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants