Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add system.cpu.frequency metric. #337

Merged
merged 9 commits into from
Oct 26, 2023

Conversation

ChrsMark
Copy link
Member

@ChrsMark ChrsMark commented Sep 21, 2023

Fixes #130

Changes

This PR adds the remaining of the CPUINFO data described at #130. The respective CPU frequency is added as a metric, called system.cpu.frequency.max based on open-telemetry/opentelemetry-collector-contrib#26532 (comment). See also #337 (comment) for more details.

Example:

On a Linux machine execute the following:

cat /proc/cpuinfo | grep MHz
cpu MHz		: 2252.777
cpu MHz		: 3000.000
cpu MHz		: 1217.024
cpu MHz		: 3000.000
cpu MHz		: 3000.000
cpu MHz		: 3000.000
cpu MHz		: 3000.000
cpu MHz		: 3000.000

As we can see the provided frequencies are per CPU and hence we distinguish this by using the system.cpu.logical_number metric attribute.

Collector's implementation PR: open-telemetry/opentelemetry-collector-contrib#27445

Merge requirement checklist

@ChrsMark ChrsMark requested review from a team September 21, 2023 10:07
@ChrsMark ChrsMark self-assigned this Sep 21, 2023
@ChrsMark ChrsMark changed the title Add cpu frequency metric Add system.cpu.frequency metric. Sep 21, 2023
@ChrsMark ChrsMark requested a review from a team September 21, 2023 13:57
mx-psi
mx-psi previously approved these changes Sep 21, 2023
@frzifus
Copy link
Member

frzifus commented Sep 27, 2023

Is here something missing?

@mx-psi
Copy link
Member

mx-psi commented Sep 27, 2023

Is there something missing?

I think we need to discuss whether this definition is practical. The library we use on the hostmetrics receiver does not support getting the real-time frequency, only the maximum frequency. See open-telemetry/opentelemetry-collector-contrib#26532 (comment)

I have not had time to check whether an alternative exists (the problem seems to be Windows).

@mx-psi mx-psi dismissed their stale review September 27, 2023 16:10

Let's figure out the practicality of implementation first

@ChrsMark ChrsMark marked this pull request as draft October 2, 2023 06:40
@ChrsMark
Copy link
Member Author

ChrsMark commented Oct 2, 2023

Hey folks! So since it seems that its not quite straight-forward to collect the current cpu's frequency in Windows maybe we can introduce the system.cpu.frequency.max instead (based on https://github.com/shirou/gopsutil/blob/2fabf15a16dca198f735a5de2722158576e986a9/cpu/cpu_linux.go#L145-L148).

This would give us a meaningful attribute for now which comes from cpufreq/cpuinfo_max_freq in Linux and is also available for other OS distros, but at the same time gives us the flexibility to add system.cpu.frequency.min and/or system.cpu.frequency.current in the future if we find the need.

@mx-psi @frzifus what do you think?

@mx-psi
Copy link
Member

mx-psi commented Oct 2, 2023

@mx-psi @frzifus what do you think?

Makes sense to me. I think the name should end in .limit instead based on https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/metrics.md#instrument-naming ?

@ChrsMark
Copy link
Member Author

ChrsMark commented Oct 2, 2023

@mx-psi @frzifus what do you think?

Makes sense to me. I think the name should end in .limit instead based on https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/metrics.md#instrument-naming ?

That would be fine but not sure how we would potentially refer to the min's equivalent?
BTW I got the idea from the already existent DB metrics -> https://github.com/open-telemetry/semantic-conventions/blob/main/model/metrics/database-metrics.yaml#L37-L53

@mx-psi
Copy link
Member

mx-psi commented Oct 2, 2023

That would be fine but not sure how we would potentially refer to the min's equivalent?

Right... and taken literally the max frequency is not the 'total amount' of frequency in any sense.

BTW I got the idea from the already existent DB metrics -> https://github.com/open-telemetry/semantic-conventions/blob/main/model/metrics/database-metrics.yaml#L37-L53

Ok then I would go with .max 👍

@frzifus
Copy link
Member

frzifus commented Oct 2, 2023

@mx-psi @frzifus what do you think?

Makes sense to me. I think the name should end in .limit instead based on https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/metrics.md#instrument-naming ?

That would be fine but not sure how we would potentially refer to the min's equivalent? BTW I got the idea from the already existent DB metrics -> https://github.com/open-telemetry/semantic-conventions/blob/main/model/metrics/database-metrics.yaml#L37-L53

Ive no strong preference. But from the outside I think min & max is more expressive.

an instrument that measures the constant, known total amount of something should be called entity.limit. For example, system.memory.limit for the total amount of memory on a system.

Can there be a limit on the frequency which is lower then max? Or is there no way to recognize this and a limit would always be max?

@ChrsMark
Copy link
Member Author

ChrsMark commented Oct 3, 2023

Can there be a limit on the frequency which is lower then max? Or is there no way to recognize this and a limit would always be max?

So based on https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt cpuinfo_max_freq is the maximum operating frequency the processor can run at while scaling_max_freq is the current "policy limits". By echoing new values into these files, you can change these limits.

So from what I can understand scaling_max_freq is the mutable limit that can be set while cpuinfo_max_freq is the actual hardware limit the vendor defines.

If that's correct then maybe this is not a metric but more a host.cpu resource attribute?

Update:
I had a look into Prometheus Node exporter (as a reference) and I see that both cpuinfo and scaling frequencies are provided:

  1. https://github.com/prometheus/node_exporter/blob/8ff3c439221f1906ee9b0b2362e490014286c7e7/collector/cpufreq_common.go#L35
  2. https://github.com/prometheus/node_exporter/blob/8ff3c439221f1906ee9b0b2362e490014286c7e7/collector/cpufreq_common.go#L50

Based on this I would suggest using system.cpu.frequency.max for now and if needed in the future we can add system.cpu.frequency.scaling.* attributes as well.

@ChrsMark ChrsMark force-pushed the add_host_frequency branch from dd71fff to d5cd8c7 Compare October 4, 2023 07:35
@ChrsMark ChrsMark changed the title Add system.cpu.frequency metric. Add system.cpu.frequency.max metric. Oct 4, 2023
@ChrsMark ChrsMark marked this pull request as ready for review October 4, 2023 07:44
@ChrsMark ChrsMark changed the title Add system.cpu.frequency.max metric. Add system.cpu.frequency metric. Oct 5, 2023
Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
@ChrsMark ChrsMark force-pushed the add_host_frequency branch from af7ae48 to f914169 Compare October 5, 2023 14:28
@ChrsMark
Copy link
Member Author

ChrsMark commented Oct 9, 2023

Hey @frzifus @mx-psi ! The collector's implementation PR is open for review: open-telemetry/opentelemetry-collector-contrib#27445.

It seems that we can add support for reporting the current cpu in Linux systems and provide the option for adding support for more platforms in the future accordingly. Hence I have updated this PR to introduce the system.cpu.frequnecy (current frequnecy) so it's back to its original state. Let me know what you think.

CHANGELOG.md Outdated Show resolved Hide resolved
@ChrsMark ChrsMark force-pushed the add_host_frequency branch from d4f011a to 9378a05 Compare October 9, 2023 10:03
@ChrsMark ChrsMark requested a review from mx-psi October 11, 2023 15:18
@mx-psi
Copy link
Member

mx-psi commented Oct 11, 2023

From the WG meeting: how does this metric compare with https://github.com/open-telemetry/semantic-conventions/blob/main/docs/system/hardware-metrics.md#hwcpu---physical-processor-metrics ? In particular:

  • Are they the same? If they are the same, we should have one of them (I would say this one). If they are different, we should clarify how they are different.
  • Should we use Hz instead of MHz for this metric?

For reference, the metrics do not seem to be used https://github.com/search?q=org%3Aopen-telemetry%20hw.cpu&type=code. cc @arminru as the original author of the document in case you have more context.

@ChrsMark
Copy link
Member Author

Thanks for summarizing this @mx-psi!

One problem I see with hw.cpu.speed is that we cannot distinguish it per cpu.logical_number. The current frequencies provided from /proc/cpuinfo are different per processor and this means that if we would decide to use hw.cpu.speed we would need to attach a metric attribute as well to distinguish from which processor the metric is coming from.
Right now we have system.cpu.logical_number which corresponds to this but I'm not sure if we could use it with hw.cpu.speed 🤔 . If we can use it then I think we are fine to go for hw.cpu.speed.
Alternatively since hw.cpu.speed is not used at the moment what would be our chances to deprecate it and use system.cpu.frequency one? @arminru your feedback would help here since you introduced the field back then :).

@arminru
Copy link
Member

arminru commented Oct 12, 2023

Hey @mx-psi @ChrsMark! I reviewed the PR back then and suggested changes which were applied and thus show up in the co-authored-by field but the original proposal came from @bertysentry, so I'll forward the questions to him instead.

@ChrsMark
Copy link
Member Author

@bertysentry do you have any content to share here?

Would that make sense to add system.cpu.logical_number attribute to the hw.cpu.speed and use it instead of introducing the system.cpu.frequency?
Or should we introduce the system.cpu.frequency since we are stabilizing the system.* namespace and deprecate the hw.cpu.speed one? To my mind it's cleaner to have system.cpu.frequency along with the system.cpu.logical_number attribute.

@open-telemetry/semconv-system-approvers thoughts on this?

@ChrsMark
Copy link
Member Author

ChrsMark commented Oct 19, 2023

From the WG meeting(Oct 18th):

  1. We are aligned that system.cpu.frequency should be introduced since it refers to logical cores as well and using just the hw.cpu.speed would be limiting. Also we need to re-use the system.cpu.logical_number in any case.
  2. As a base frequency Hz should be used.

I will update the PR accordingly.

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
@bertysentry
Copy link
Contributor

Sorry for the late reply (I'm on vacation!)

It is true that hw.cpu.speed.limit has been defined for the entire CPU package and doesn't specify a logical processor number, which is a limitation and needs to be fixed (I will take care of this: hw.cpu. metrics should represent individual cores, when possible).

WRT the deprecation of hw.cpu.speed.*, I think we should keep both system.cpu.frequency and this one, because logical processors seen by the operating system are not always the same as the real processors. Think of hyper-threading, and virtual cores allocated to virtualization systems (VMware, qemu, etc.).

NB: The challenge with processors is that the technology evolves quite rapidly, and some metrics that sound logical today will be meaningless in a few years. Example: Intel's RAPL.

@AlexanderWert AlexanderWert merged commit 26f4992 into open-telemetry:main Oct 26, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Define attributes to identify a CPU
8 participants