Metricbeat 7.12.1 kubernetes.container.memory.usage.limit.pct is calculated incorrectly #25657

F-Potter · 2021-05-11T10:21:55Z

Hi,

Metricbeat is providing the kubernetes.container.memory.usage.limit.pct value by obtaining the memory.max_usage_in_bytes (which can be divined by setting a resource memory limit in the kubernetes deployment) and dividing it by the kubernetes.container.memory.usage.bytes.

While kubernetes is OOM killing containers based on the kubernetes.container.memory.workingset.bytes. Meaning that the kubernetes.container.memory.usage.limit.pct currently is not a good value to have alerting on, since the kubernetes.container.memory.usage.bytes is higher than the kubernetes.container.memory.workingset.bytes and giving false positives that the container is almost OOM killed, while in reality it is still okay, until the kubernetes.container.memory.workingset.bytes is reaching the resource memory limit.

Is it possible to adjust the kubernetes.container.memory.usage.limit.pct based on the kubernetes.container.memory.workingset.bytes?

jsoriano · 2021-05-11T11:27:35Z

Hi @F-Potter,

kubernetes.container.memory.workingset.bytes is only greater than zero in Windows, are your affected nodes running in Windows?

If that is the case, I guess this is similar to the issue with pods, solved by #25428.

elasticmachine · 2021-05-11T11:27:46Z

Pinging @elastic/integrations (Team:Integrations)

F-Potter · 2021-05-11T11:39:34Z

Hi @jsoriano,

No the affected nodes are all Ubuntu 18.04.5 LTS

F-Potter · 2021-05-11T11:45:26Z

F-Potter · 2021-05-11T11:46:55Z

as you see here the workingset is 580MB, the memory limit is 768MB (doesn't show in the output), but the limit.pct says it is 0,998 which is based on the memory.usage.bytes, which is 766.5MB

brianharwell · 2021-05-11T11:59:03Z

@F-Potter At what point does the pod get OOM killed? Do you have an example of the usage.limit.pct going above 100%?

jsoriano · 2021-05-11T11:59:54Z

as you see here the workingset is 580MB, the memory limit is 768MB (doesn't show in the output), but the limit.pct says it is 0,998 which is based on the memory.usage.bytes, which is 766.5MB

Oh yes, you are right, this value is also available in other OSs, this will need further investigation.

F-Potter · 2021-05-11T12:19:02Z

@brianharwell will look at it, not sure if it stops at 100% or goes over it, but the issue is more that the wrong value is measured, since k8s is looking at the kubernetes.container.memory.workingset.bytes. So monitoring a different value will result in a different percentage which results in wrong alerting

F-Potter · 2021-05-11T12:22:58Z

The limit.pct stops at 1 (100%), so won't get higher than that.

brianharwell · 2021-05-11T12:45:47Z

I am curious to see how this works on Linux because on Windows, I get memory errors when the workingset bytes is 72% of the memory limit. I can try my test app on Linux and see what happens.

faec · 2021-08-26T22:12:40Z

This doesn't look strictly incorrect -- usage.limit.pct is still measuring a correct, useful value, and it's the value corresponding to usage.bytes, which is what would be expected from the metric name. I think the confusion here is that "usage" (as I understand it) is the full allocated memory of the container, including pages that may be on disk, idle, etc. So we wouldn't expect to see anything go above 100%, but also, "usage" of 99% isn't necessarily worrying the way a working set of that size would be. Maybe we should also be calculating and providing memory.workingset.limit.pct so users can monitor the appropriate one for their situation?

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 11, 2021

jsoriano added the Team:Integrations Label for the Integrations team label May 11, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 11, 2021

jsoriano mentioned this issue May 11, 2021

Use working set bytes if usage bytes is zero #25428

Merged

6 tasks

jsoriano added the :Windows label May 11, 2021

MichaelKatsoulis mentioned this issue Dec 21, 2021

Calculate memory.working_set.limit.pct for pod and container metricset #29547

Merged

6 tasks

MichaelKatsoulis closed this as completed in #29547 Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metricbeat 7.12.1 kubernetes.container.memory.usage.limit.pct is calculated incorrectly #25657

Metricbeat 7.12.1 kubernetes.container.memory.usage.limit.pct is calculated incorrectly #25657

F-Potter commented May 11, 2021

jsoriano commented May 11, 2021

elasticmachine commented May 11, 2021

F-Potter commented May 11, 2021

F-Potter commented May 11, 2021

F-Potter commented May 11, 2021

brianharwell commented May 11, 2021

jsoriano commented May 11, 2021

F-Potter commented May 11, 2021

F-Potter commented May 11, 2021

brianharwell commented May 11, 2021

faec commented Aug 26, 2021

Metricbeat 7.12.1 kubernetes.container.memory.usage.limit.pct is calculated incorrectly #25657

Metricbeat 7.12.1 kubernetes.container.memory.usage.limit.pct is calculated incorrectly #25657

Comments

F-Potter commented May 11, 2021

jsoriano commented May 11, 2021

elasticmachine commented May 11, 2021

F-Potter commented May 11, 2021

F-Potter commented May 11, 2021

F-Potter commented May 11, 2021

brianharwell commented May 11, 2021

jsoriano commented May 11, 2021

F-Potter commented May 11, 2021

F-Potter commented May 11, 2021

brianharwell commented May 11, 2021

faec commented Aug 26, 2021