Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Telemetry documentation inaccuracies #7773

Open
m1keil opened this issue Apr 22, 2020 · 8 comments
Open

[Question] Telemetry documentation inaccuracies #7773

m1keil opened this issue Apr 22, 2020 · 8 comments
Labels
theme/docs Documentation issues and enhancements type/bug

Comments

@m1keil
Copy link

m1keil commented Apr 22, 2020

I'm running Nomad (0.10.2) with Docker task driver (19.03.8) prometheus metrics enabled.

In the telemetry docs it says:

  • nomad.client.allocs.<Job>.<TaskGroup>.<AllocID>.<Task>.cpu.total_percent: Total CPU resources consumed by the task across all cores

First of all, it seems like the docs are a bit out of date in regards to the labels & metric names, as its now changed to nomad.client.allocs.cpu.system/user/total_percent and labels, similar to how Host Metrics are documented (the post 0.7 change).

The description of the metric is confusing as well.
The total_percent metric can easily spike above 100%, which means it's not "across all cores" but actually a summary of. Maybe it's a language barrier on my side. I was expecting a "normalized" value here. I.e in a 4 core machine, having 1 core at 100% use leads to 25% in this metric. But in here we get 100% instead (so the max is 400%).

Because Nomad doesn't expose any "Number of cores" metric, this makes it hard to estimate how high the utilization really is. Additionally, there are no metrics that can tell how far of are we from the CPU resource limit in a similar fashion to what is being graphed on the allocation page in the nomad UI.

The memory related metrics for the allocation doesn't have similar issues as we can compute utilization by doing nomad_client_allocs_memory_usage / nomad_client_allocs_memory_allocated.
(btw nomad_client_allocs_memory_usage is not documented either or maybe it was renamed from nomad_client_allocs_memory_used).

Questions are:
a) Is this a documentation bug? Or am I just reading this incorrectly?
b) Is there any way we can graph the utilization of an allocation against its CPU resource limit?

@tgross
Copy link
Member

tgross commented Apr 22, 2020

Hi @m1keil! Thanks for reporting this!

First of all, it seems like the docs are a bit out of date in regards to the labels & metric names, as its now changed to nomad.client.allocs.cpu.system/user/total_percent and labels, similar to how Host Metrics are documented (the post 0.7 change).

Yeah that seems like a stale documentation item. I'll mark this as a documentation bug to fix.

I was expecting a "normalized" value here. I.e in a 4 core machine, having 1 core at 100% use leads to 25% in this metric. But in here we get 100% instead (so the max is 400%).

That's typically the way you'd see it with other monitoring tools like top, so I think we're trying to be consistent there.

Because Nomad doesn't expose any "Number of cores" metric, this makes it hard to estimate how high the utilization really is.

You should be able to get this via the cpu.numcores metadata on each node (see nomad node status -verbose :node_id). I don't think we typically emit these kind of stats as telemetry items as they don't change over the lifetime of a client node.

@tgross tgross added type/bug theme/docs Documentation issues and enhancements labels Apr 22, 2020
@m1keil
Copy link
Author

m1keil commented Apr 22, 2020

Thanks for the response @tgross .

You should be able to get this via the cpu.numcores metadata on each node (see nomad node status -verbose :node_id). I don't think we typically emit these kind of stats as telemetry items as they don't change over the lifetime of a client node.

Yes, of-course, there are ways to get this information externally of the monitoring system. However, without having this info in the telemetry, there's no way to dynamically find how close is the allocation to the max.
Shouldn't this behave in similar manner to the memory metrics? These expose the allocated amount memory - a number that doesn't change during the lifetime of the allocation.

@henrikjohansen
Copy link

henrikjohansen commented May 19, 2020

@tgross @m1keil The CPU resources you assign in your jobspec or in quota definitions are measured in Mhz, not percent.

%CPU might be interesting for cluster or node operators but not really for the people running jobs on the cluster because of ☝️

Now, the nomad alloc status command reports Mhz :

...
Task "foobar" is "running"
Task Resources
CPU        Memory           Disk     Addresses
0/100 MHz  5.2 MiB/300 MiB  300 MiB  http: 1.2.3.4:12345
...

The UI reports both percent and Mhz (0 Mhz / 100 Mhz reserved)

Since PR6784 added a client.allocs.<job>.<group>.<alloc>.<task>.cpu.allocated metric (still undocumented though) the only thing needed to solve CPU resource monitoring of allocations would be a client.allocs.<job>.<group>.<alloc>.<task>.cpu.usage metric that reports the amount of Mhz consumed by a task, right?

This would bring CPU telemetry in line with the Memory telemetry, allowing the same procedures to be used for monitoring both.

@m1keil
Copy link
Author

m1keil commented May 19, 2020

Since PR6784 added a client.allocs.....cpu.allocated metric (still undocumented though) the only thing needed to solve CPU resource monitoring of allocations would be a client.allocs.....cpu.usage metric that reports the amount of Mhz consumed by a task, right?

I don't think it would. Unless cpu_hard_limit is True (in case of Docker driver anyway). The allocated "MHz" is soft limit by default. An allocation can use 100% of CPU regardless of the amount of MHz assigned to it (which also translates directly to CPU shares).

Basically I just want to understand if using Nomad's telemetry I can answer the simple question of "is my node hitting the max CPU utilization ceiling?"

@henrikjohansen
Copy link

henrikjohansen commented May 19, 2020

@m1keil Well, client.allocs.<job>.<group>.<alloc>.<task>.cpu.allocated would give you the resources you have allocated for a given task, let's say 2000 Mhz.

The metric I am suggesting (client.allocs.<job>.<group>.<alloc>.<task>.cpu.usage) would give you the actual utilization in Mhz which indeed might be substantially higher than client.allocs.<job>.<group>.<alloc>.<task>.cpu.allocated.

☝️ is useful for job operators since it exposes the same unit of measurement (Mhz) that is used elsewhere in Nomad (Jobspec, Quota and the CLI/UI) and easily let's you check CPU consumption or if .cpu.usage > .cpu.allocated and does not require access to metrics about shared resources such as hosts.

Basically I just want to understand if using Nomad's telemetry I can answer the simple question of "is my node hitting the max CPU utilization ceiling?"

Now, Nomad also exposes host metrics such asnomad.client.allocated.cpu / nomad.client.unallocated.cpu or nomad.client.host.cpu.total / nomad.client.host.cpu.idle and those are much more useful for node or cluster operators since it let's you check the resource consumption of an entire node, class of nodes or even across entire clusters.

You question of "is my node hitting the max CPU utilization ceiling?" can thus be answered in either Mhz or % cpu utilization by using the host metrics mentioned above.

@m1keil
Copy link
Author

m1keil commented May 20, 2020

@henrikjohansen first of all, thanks for the detailed answers. Much appreciated. Some background about me, I'm all of the above (job and cluster operator).

Regarding nomad.client.allocated.cpu & nomad.client.unallocated.cpu: These metrics are useful from the perspective of "how much more resources I have left to allocate". It does not reflect actual consumption of the resources. These numbers reflect the same info that is shown under Allocated Resources when running nomad node status.

You are totally right about the host metrics. That would answer my question. However, I think I was in a bit of a rush when I was writing that question down.

I want to be able to tell what is the current CPU consumption of a single allocation. And I want that to be a percentage. Just like you would see it in top. I want to be able to stack all of the allocations on the same client in order to be able to inspect any potential issues that were caused due to CPU contention.

The metric you are suggesting (.cpu.usage) is only useful if there's a complementary metric which exposed the upper limit of MHz. As far as I understand, there is no such metric.
You can reach to this number by combining nomad.client.allocated.cpu & nomad.client.unallocated.cpu. So I guess would work.

But isn't it just easier to make a .cpu.total_percent-like metric that is normalized to core count instead?

@henrikjohansen
Copy link

henrikjohansen commented May 20, 2020

Regarding nomad.client.allocated.cpu & nomad.client.unallocated.cpu: These metrics are useful from the perspective of "how much more resources I have left to allocate". It does not reflect actual consumption of the resources. These numbers reflect the same info that is shown under Allocated Resources when running nomad node status.

That depends on how you define 'consumption' I guess. You can easily run into situations where Nomad is unable to schedule more work to a node even though it's essentially idle ... because all available resources are reserved to jobs running on that node (and thus considered consumed) .

I want to be able to tell what is the current CPU consumption of a single allocation. And I want that to be a percentage. Just like you would see it in top. I want to be able to stack all of the allocations on the same client in order to be able to inspect any potential issues that were caused due to CPU contention.

Personally I don't think that this makes much sense since literally everything else in Nomad uses Mhz.

If you want to see the current CPU consumption of an allocation the suggested metric (.cpu.usage) will provide that in Mhz.

If you want to see if an allocation is consuming more CPU resources than you have reserved in your jobspecs resources stanza(s) you can check if .cpu.usage > cpu.allocated.

If you would like to check if jobs are getting CPU throttled because of CPU contention issues on a node you should use .cpu.throttled_time.

Lastly, using .cpu.usage and client.allocated.cpu + client.unallocated.cpu would let you figure out %CPU usage for a single allocation if you really wanted to :)

As a side node, if you want to check CPU contention issues on a per-node/OS level then CPU run-queue statistics are as important as %CPU since contention can occur without 100% CPU utilization.

You can reach to this number by combining nomad.client.allocated.cpu & nomad.client.unallocated.cpu. So I guess would work.

Indeed and IIRC these also handle capacity that you might have reserved in your nomad agent config correctly.

But isn't it just easier to make a .cpu.total_percent-like metric that is normalized to core count instead?

In our case, no. We use different physical hardware and expose those as node classes in Nomad (think high-cpu, gpu, high-mem, etc). Now, what does 23% utilization mean on an AMD EPYC node with 256 cores compared to a 32 core Intel node?

Consumption of 25000 Mhz however means the same regardless of ☝️

@m1keil
Copy link
Author

m1keil commented May 20, 2020

It's my understanding that .cpu.throttled_time only availabe if cpu_hard_limit is True. (link).

Tbh, I see no reason why both use cases can't be answered. I do understand your point. I just don't see it as "either this or that". But it's less important at the moment. Looks like I need to fallback to cAdvisor in order to monitor docker directly to get these numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/docs Documentation issues and enhancements type/bug
Projects
None yet
Development

No branches or pull requests

3 participants