Nomad task got OOM killed when it was using only ~70% of it's MemoryMB limit #4495

fho · 2018-07-11T09:53:44Z

This is the same issue then described in #4491 but as bug report.

Nomad version

Nomad v0.8.3

Issue

We have a nomad job that runs an application called claimsearch-service with the exec executor.
The memory limit is set to 50MiB in the nomad job file.
The application got OOM killed when it was only using 35,35MB RSS.

In the memory cgroup were the following processes with the following RSS usage:

Process	RSS
nomad	13,75MiB
claimsearch-service	35,36MiB
grpc-health-check	4,93 MiB

Expected behaviour

The task is not OOM killed when it uses less RSS memory then configured in the MemoryMB parameter in the Resources Stanza of the nomad job file.
The configured memory limit only applies to the executed nomad Task.

That the memory consumption of other processes are accounted into the memory limit is non-intuitive, it's not documented and it makes it difficult to calculate the correct Memory limit value for a task.
See also: #4491

OOM kill Kernel log

Jul 10 08:56:03 prd-sisu-nomad-client-1.localdomain kernel: Task in /nomad/ebf75298-ba47-98a8-28e5-a08daf20d60e killed as a result of limit of /nomad/ebf75298-ba47-98a8-28e5-a08daf20d60e
Jul 10 08:56:03 prd-sisu-nomad-client-1.localdomain kernel: memory: usage 51200kB, limit 51200kB, failcnt 16402868
Jul 10 08:56:03 prd-sisu-nomad-client-1.localdomain kernel: memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
Jul 10 08:56:03 prd-sisu-nomad-client-1.localdomain kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Jul 10 08:56:03 prd-sisu-nomad-client-1.localdomain kernel: Memory cgroup stats for /nomad/ebf75298-ba47-98a8-28e5-a08daf20d60e: cache:36KB rss:51164KB rss_huge:16384KB mapped_file:8KB dirty:0KB writeback:0KB inactive_anon:25624KB a
ctive_anon:23492KB inactive_file:0KB active_file:0KB unevictable:0KB
Jul 10 08:56:03 prd-sisu-nomad-client-1.localdomain kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
Jul 10 08:56:03 prd-sisu-nomad-client-1.localdomain kernel: [31962]     0 31962    82711     3521      44       5       61             0 nomad
Jul 10 08:56:03 prd-sisu-nomad-client-1.localdomain kernel: [31970]    33 31970    13807     9046      31       5        0             0 claimsearch-ser
Jul 10 08:56:03 prd-sisu-nomad-client-1.localdomain kernel: [30195]    33 30195    28465     1262      19       6        0             0 grpc-health-che
Jul 10 08:56:03 prd-sisu-nomad-client-1.localdomain kernel: Memory cgroup out of memory: Kill process 31970 (claimsearch-ser) score 709 or sacrifice child
Jul 10 08:56:03 prd-sisu-nomad-client-1.localdomain kernel: Killed process 31970 (claimsearch-ser) total-vm:55228kB, anon-rss:36184kB, file-rss:0kB
Jul 10 08:56:03 prd-sisu-nomad-client-1.localdomain kernel: grpc-health-che invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=0
Jul 10 08:56:03 prd-sisu-nomad-client-1.localdomain kernel: grpc-health-che cpusetëf75298-ba47-98a8-28e5-a08daf20d60e mems_allowed=0

Job file

The full job file can be found at: http://dpaste.com/05YWFVW

The text was updated successfully, but these errors were encountered:

preetapan · 2018-07-12T20:15:08Z

@fho thanks for the details. We plan to fix executor memory utilization in the upcoming release, 13MB is rather high.

fho · 2018-07-13T09:06:00Z

@preetapan
The amount of memory that nomad executor consumes is not the issue.
As long as the task is in a cgroup with other processes , the task gets OOM killed when it uses less
memory then it's configured memory limit.

Let's assume the memory consumption of nomad executor was lowered from 13MB to 5MB.
Now I run a task with a low memory footprint of 2MB via nomad, configure it's memory limit to 5MB to have some buffer.
The task would get OOM killed because the cgroup memory limit is reached:
2MB task memory + 5MB nomad executor memory > 5MB memory limit

preetapan · 2018-07-13T14:18:50Z

@fho see comments me and my co-worker made already about why the executor and script checks have to be in the same cgroup.

#4491 (comment)
#4491 (comment)

There's always going to be some amount of overhead from using the executor, and we will address that with a TBD mechanism - will likely either account for that when creating the container, or use soft limits.

fho · 2018-07-13T16:18:13Z

@preetapan

The executor is responsible for managing the lifecycle of the application, so its a desired feature to have it be in the same cgroup.
[..]
Allowing script checks to run outside the task's container and resource limits would be a major security and isolation issue.

I don't understand yet why they have to be in the same cgroup.
It would be great if you could elaborate on it.

What would be the disadvantages of other solutions like having each check and each nomad-executor in their own memory cgroups?
What are the advantages of having them in the same memory cgroup?
What would are the concrete security and isolation issues if each check and each nomad-executor is in their own memory cgroup?

thanks a lot

memelet · 2018-09-06T21:07:54Z

I'm having a hard time understanding this. I have a container with 600MB limit. A java process with use heap+non-heap used of ~360. And its getting oom killed every 10 minutes or so. I can't be that nomad services are using 240MB? And if not, how can i tell why the processing is getting killed?

sirkjohannsen · 2019-05-21T15:33:22Z

I would just like to add that with nomad 0.9 the resource footprint of nomad processes within the cgroup have increased even more.
Most of our lightweight microservices now need double the resources configured in nomad compared to 0.8.

notnoop · 2019-05-31T19:42:43Z

Wanted to clarify the behavior of Nomad 0.9:

Nomad 0.9 has a regression in exec driver where a task requires declaring at least 50-100MB RAM requirement and where nomad binary overhead is reported as part of the task cgroup stats, penalizing low memory tasks. The underlying issue was a CVE fix in runc ([CVE-2019-5736]: Runc uses more memory during start up after the fix opencontainers/runc#1980) that was fixed in runc/libcontainer (nsenter: cloned_binary: "memfd" cleanups opencontainers/runc#1984) and we picked it up in Use upstream libcontainer package #5437 . Nomad 0.9.2 should address this point.
Additionally, Nomad 0.9 had some client re-architecture that caused more host level memory overhead per task; for example, we run an additional log collector process per task that can be significant overhead when running many tasks and without kernel caching binaries effectively. We plan to address this regression along with overall nomad overhead.

notnoop · 2019-12-13T17:33:14Z

I'm closing this ticket as exec driver has been significantly changed since 0.8 and I believe the notes here are either addressed or no longer relevant. I'd encourage users experiencing memory issues to create a new issue against 0.10.

Since my last May 31, comment we made the following changes:

We've discovered that the overhead of additional processes we speculated and worried about about isn't actually significant and RSS metric is misleading. Some info is in [Question] Does Nomad 0.9 add significant memory overhead? #6543 .
We removed executor from the cgroup of the task in executor: stop joining executor to container cgroup #6839 . The PR explains the motivation and context.

Please let us know of any issues you see and we will follow up. Thanks!

github-actions · 2022-11-15T02:30:00Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

preetapan added type/bug theme/client theme/driver/exec and removed theme/client labels Jul 11, 2018

ashald mentioned this issue Jan 8, 2019

Inaccurate memory reporting in Nomad #5165

Closed

preetapan mentioned this issue Apr 24, 2019

failed to launch command with executor: rpc error: #5576

Closed

notnoop closed this as completed Dec 13, 2019

github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nomad task got OOM killed when it was using only ~70% of it's MemoryMB limit #4495

Nomad task got OOM killed when it was using only ~70% of it's MemoryMB limit #4495

fho commented Jul 11, 2018

preetapan commented Jul 12, 2018

fho commented Jul 13, 2018 •

edited

Loading

preetapan commented Jul 13, 2018

fho commented Jul 13, 2018

memelet commented Sep 6, 2018

sirkjohannsen commented May 21, 2019

notnoop commented May 31, 2019

notnoop commented Dec 13, 2019

github-actions bot commented Nov 15, 2022

Nomad task got OOM killed when it was using only ~70% of it's MemoryMB limit #4495

Nomad task got OOM killed when it was using only ~70% of it's MemoryMB limit #4495

Comments

fho commented Jul 11, 2018

Nomad version

Issue

Expected behaviour

OOM kill Kernel log

Job file

preetapan commented Jul 12, 2018

fho commented Jul 13, 2018 • edited Loading

preetapan commented Jul 13, 2018

fho commented Jul 13, 2018

memelet commented Sep 6, 2018

sirkjohannsen commented May 21, 2019

notnoop commented May 31, 2019

notnoop commented Dec 13, 2019

github-actions bot commented Nov 15, 2022

fho commented Jul 13, 2018 •

edited

Loading