Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/kubeletstats] Add k8s.container.cpu.node.utilization metric #32295

Merged
merged 3 commits into from
May 31, 2024

Conversation

ChrsMark
Copy link
Member

@ChrsMark ChrsMark commented Apr 10, 2024

Description:

At the moment. We calculate the k8s.container.cpu_limit_utilization as a ratio of the container's limits at

cpuMetrics.LimitUtilization(mb, currentTime, value/r.cpuLimit)
.

Similarly we can calculate the cpu utilization as ratio of the whole node's allocatable cpu, if we divide by the total number of node's cores.

We can retrieve this information from the Node's Status.Capacity, for example:

$ k get nodes kind-control-plane -ojsonpath='{.status.capacity}'
{"cpu":"8","ephemeral-storage":"485961008Ki","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"32564732Ki","pods":"110"}

Performance concerns

In order to get the Node's capacity we need an API call to the k8s API in order to get the Node object.
Something to consider here is the performance impact that this extra API call would bring. We can always choose to have this metric as disabled by default and clearly specify in the docs that this metric comes with an extra API call to get the Node of the Pods.

The good thing is that kubeletstats receiver target's only one node so I believe it's a safe assumption to only fetch the current node because all the observed Pods will belong to the one single local node. Correct me if I miss anything here.

In addition, instead of performing the API call explicitly on every single scrape we can use an informer instead and leverage its cache. I can change this patch to this direction if we agree on this.

Would love to hear other's opinions on this.

Todos

✅ 1) Apply this change behind a feature gate as it was indicated at #27885 (comment)
✅ 2) Use an Informer instead of direct API calls.

Link to tracking Issue:
ref: #27885

Testing:

I experimented with this approach and the results look correct. In order to verify this I deployed a stress Pod on my machine to consume a target cpu of 4 cores:

apiVersion: v1
kind: Pod
metadata:
  name: cpu-stress
spec:
  containers:
  - name: cpu-stress
    image: polinux/stress
    command: ["stress"]
    args: ["-c", "4"]

And then the collected container.cpu.utilization for that Pod's container was at 0,5 as exepcted, based that my machine-node comes with 8 cores in total:

cpu-stress

Unit test is also included.

Documentation:

Added: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32295/files#diff-8ad3b506fb1132c961e8da99b677abd31f0108e3f9ed6999dd96ad3297b51e08

@jinja2
Copy link
Contributor

jinja2 commented May 29, 2024

It looks like we'd be introducing more combinations of utilization metric for different resource types cpu/memory/ephemeral_storage for both k8s.container and k8s.pod. Are pre-computed utilization metric for different limits like we have in this receiver for container, pod, and now node useful to users?

I understand one of the arguments for doing the utilization calculation in this receiver has been due to users wanting all resource usage metrics from the same receiver, and currently users need to run both the kubeletstats (reports usage for container) and the k8scluster (reports node's capacity/allocatable) receiver to be able to calculate the utilization. Would we instead consider reporting the node's capacity metric from the kubeletstats receiver instead of doing utilization calculations in the receiver? With the node's capacity/allocatable and pod's/container's resource usage (have to make sure the node id is available for filtering) being reported from kubeletstats receiver, the user has the option to calculate the utilization in their backend observability platform. Imo, this gives user more flexibility in deciding what they consider as limit, or whether they want to track it per container or per pod

@ChrsMark
Copy link
Member Author

Are pre-computed utilization metric for different limits like we have in this receiver for container, pod, and now node useful to users?

From my point of view, yes :)

Would we instead consider reporting the node's capacity metric from the kubeletstats receiver instead of doing utilization calculations in the receiver?

Wouldn't that mean that we will have the same metric potentially emitted from 2 different receivers at the same time?

In general, that's mostly covered already back in #24905 (comment) when the k8s.container.cpu_limit_utilization and the rest were proposed and added.

From my perspective agreeing on unambiguous metrics (even as optional) that we can provide ootb is the goal here. If we pass the responsibility to users/backends to do such calculations/namings then we kind of start losing the vendor/backend agnostic idea.

Note: the k8s.pod.cpu_limit/request_utilization metrics are not provided at all if there are containers with no limit/request set. That means that the utilization metric as ratio against the node's "limit" is the one that will be always there.

@ChrsMark
Copy link
Member Author

Based on the discussions at #32295 (comment) I have pushed a change to rename the metric to k8s.container.cpu.node.utilization.
That's better aligned with the SemConv guidelines, and also preserves the k8s.container.cpu.node.* as a namespace in case we want to add an additional metric in the future to cover the allocatable related ratio.

@andrzej-stencel @povilasv @TylerHelmuth @dmitryax @jinja2 please take another look.

@ChrsMark ChrsMark force-pushed the k8s_container_metrics branch 2 times, most recently from 9ac3346 to 42fc09e Compare May 30, 2024 07:39
@andrzej-stencel andrzej-stencel changed the title [receiver/kubeletstats] Add k8s.container.cpu.node_limit_utilization metric [receiver/kubeletstats] Add k8s.container.cpu.node.utilization metric May 30, 2024
@ChrsMark ChrsMark force-pushed the k8s_container_metrics branch 3 times, most recently from 3b543df to 8b224eb Compare May 30, 2024 18:38
@jinja2
Copy link
Contributor

jinja2 commented May 30, 2024

Wouldn't that mean that we will have the same metric potentially emitted from 2 different receivers at the same time?

I think it is acceptable to have some metrics be reported by both k8scluster and kubeletstats receiver. k8scluster receiver is run as a deployment so usually only one instance of the receiver is doing the watching/reporting for all nodes/pods in cluster. We have seen missed updates with massive clusters or clusters with higher churn rate, etc. kubeletstats receiver reporting the limit/request and/or node capacity/allocatable for the local node and pods would be a beneficial addition for users who do not want to run both receivers if they are only looking for metrics related to resource usage, and also a more scalable alternative for k8scluster receiver.

From my perspective agreeing on unambiguous metrics (even as optional) that we can provide ootb is the goal here. If we pass the responsibility to users/backends to do such calculations/namings then we kind of start losing the vendor/backend agnostic idea.

Tbh, I don't think this pre-computed metric adds much value for k8s users. If the goal is to make sure that users have the option of comparing usage against node's capacity then reporting just the capacity instead of the pre-computed utilization is a more flexible option, because it allows k8s admins and users to calculate "utilization" as they see fit for their k8s setup. I can think of a few scenarios where a k8s admin would calculate usage against node's cap/allo but these are not in-line with the metric being added here. For e.g., when looking at a node's utilization to determine whether an admin needs to add more capacity, I am more likely to sum the resource requests of all pods on the node and compare that against the node's allocatable instead of looking at the actual usage of the containers, which kind of does not matter because even if the container is using only 1 Gi out of a 10Gi request, the node has still reserved that 10Gi and is not usable for any other container on the node. As an application developer deploying to a k8s cluster, I am more likely to see the utilization wrt my pod's request/limit instead of the node's capacity. And if I have not set a limit on my container, the limit of resource such a container is technically allowed to use is the nodes allocatable - sum(request of all other pods on this node). These are just some examples for how a k8s user/admin might think of a limit as imposed by the node's availability of resource. I agree that there is probably a good usecase to measure usage of a container's cpu wrt to node's capacity, but there are plenty of such combinations when we try to compute utilization wrt k8s node's available resource which might be equally or more useful imo, and a good way to accommodate these different scenarios is to keep the metrics more flexible instead of pre-computing.

@ChrsMark
Copy link
Member Author

ChrsMark commented May 31, 2024

I agree that there is probably a good usecase to measure usage of a container's cpu wrt to node's capacity, but there are plenty of such combinations when we try to compute utilization wrt k8s node's available resource which might be equally or more useful imo, and a good way to accommodate these different scenarios is to keep the metrics more flexible instead of pre-computing.

Thank's for the additional information!

I guess users are always able to apply such calculations in their back-ends if they want to, even with the metrics that are provided today. In this, I don't see all these ideas really being blockers for this optional metric.

For reference, this one has been used in Kibana for years now so being able to retrieve this ootb through the Collector would only be of benefit. Having the utilization calculated against a hard limit is a valuable, non relative, indicator to observe over time, build alerts/reports on top of etc. It can also come handy in order to quickly drill down to specific workloads while investigating infra resource utilization issues.

Something to not underestimate is that query aggregations like those mentioned can become "expensive" sometimes, so I see value in having the option to do such pre-calculations on the edge. I would be more than happy to see any additional one, that we see as appropriate, in the future.

Ultimately, providing this kind of metrics as optional is flexible enough since users that want to calculate them on their back-ends they can do so, while those that want them pre-calculated on the edge will still have the option for that.

…metric

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
@andrzej-stencel
Copy link
Member

I agree with @ChrsMark that this metric is valuable.

The discussion around naming and exact meaning (computing vs. node's capacity or allocatable cpu) would have been much easier if the semantic conventions were already specified for k8s. However, I don't think we should block adding new useful metrics to the collector until there are semantic conventions for them first. This is also not what we have been doing historically.

Ideally, I would have each metric created by this receiver (and other receivers) document its semantic conventions status, or whether a semantic convention even exists for the metric. That would make it easier for users to reason about a specific metric's stability. Given that the receiver as a whole is in beta stability, we would like users to be aware of the fact that not all metrics in the component might be equally stable. If we'd want this to be done, that would be a separate PR that should not block this one or others.

@andrzej-stencel andrzej-stencel merged commit 4dd718b into open-telemetry:main May 31, 2024
162 checks passed
@github-actions github-actions bot added this to the next release milestone May 31, 2024
TylerHelmuth added a commit that referenced this pull request Jun 14, 2024
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
This PR adds the `k8s.pod.cpu.node.utilization` metric.
Follow up from
#32295 (comment)
(cc @TylerHelmuth) .

**Link to tracking Issue:** <Issue number if applicable>
Related to
#27885.

**Testing:** <Describe what testing was performed and which tests were
added.> Adjusted the respective unit test to cover this metric as well.

**Documentation:** <Describe the documentation added.> Added

Tested with a single container Pod:


![podCpu](https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/11754898/9a0069c2-7077-4944-93b6-2dde00979bf3)

---------

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com>
Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
t00mas pushed a commit to t00mas/opentelemetry-collector-contrib that referenced this pull request Jun 18, 2024
…-telemetry#33390)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
This PR adds the `k8s.pod.cpu.node.utilization` metric.
Follow up from
open-telemetry#32295 (comment)
(cc @TylerHelmuth) .

**Link to tracking Issue:** <Issue number if applicable>
Related to
open-telemetry#27885.

**Testing:** <Describe what testing was performed and which tests were
added.> Adjusted the respective unit test to cover this metric as well.

**Documentation:** <Describe the documentation added.> Added

Tested with a single container Pod:


![podCpu](https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/11754898/9a0069c2-7077-4944-93b6-2dde00979bf3)

---------

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com>
Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
andrzej-stencel pushed a commit that referenced this pull request Jul 10, 2024
…ion` metrics (#33591)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Similar to
#32295
and
#33390,
this PR adds the `k8s.{container,pod}.memory.node.utilization` metrics.


**Link to tracking Issue:** <Issue number if applicable>
#27885

**Testing:** <Describe what testing was performed and which tests were
added.> Added unit test.

**Documentation:** <Describe the documentation added.> Added

### Manual testing

1. Using the following target Pod:
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: memory-demo
spec:
  containers:
  - name: memory-demo-ctr
    image: polinux/stress
    resources:
      requests:
        memory: "8070591Ki"
      limits:
        memory: "9070591Ki"
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "800M", "--vm-hang", "4"]
```
2. 

![memGood](https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/11754898/fae04b30-59ca-4d70-8446-f54b5a085cf7)

On a node of 32,5G memory the 800Mb container/Pod consumes the
`0.8/32.5=0.0246...=0.025`.

---------

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this pull request Jul 11, 2024
…etry#32591)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
This PR implements some basic refactoring required for
open-telemetry#32295.
Doing this in a standalone PR was requested at
open-telemetry#32295 (comment).

The change moves a library method called `newNodeSharedInformer` from
the `k8sattributes` processor package to a generic one since it can be
re-used by multiple various components.

Note: I'm not sure if a changelog is required for this change, so I left
it out for now.

**Link to tracking Issue:** <Issue number if applicable>

**Testing:** <Describe what testing was performed and which tests were
added.>

**Documentation:** <Describe the documentation added.>

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this pull request Jul 11, 2024
…ic (open-telemetry#32295)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

At the moment. We calculate the `k8s.container.cpu_limit_utilization` as
[a ratio of the container's
limits](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/kubeletstatsreceiver/documentation.md#k8scontainercpu_limit_utilization)
at
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/867d6700c31446172e6998e602c55fbf7351831f/receiver/kubeletstatsreceiver/internal/kubelet/cpu.go#L30.

Similarly we can calculate the cpu utilization as ratio of the whole
node's allocatable cpu, if we divide by the total number of node's
cores.

We can retrieve this information from the Node's `Status.Capacity`, for
example:

```console
$ k get nodes kind-control-plane -ojsonpath='{.status.capacity}'
{"cpu":"8","ephemeral-storage":"485961008Ki","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"32564732Ki","pods":"110"}
```




## Performance concerns

In order to get the Node's capacity we need an API call to the k8s API
in order to get the Node object.
Something to consider here is the performance impact that this extra API
call would bring. We can always choose to have this metric as disabled
by default and clearly specify in the docs that this metric comes with
an extra API call to get the Node of the Pods.

The good thing is that `kubeletstats` receiver target's only one node so
I believe it's a safe assumption to only fetch the current node because
all the observed Pods will belong to the one single local node. Correct
me if I miss anything here.

In addition, instead of performing the API call explicitly on every
single `scrape` we can use an informer instead and leverage its cache. I
can change this patch to this direction if we agree on this.

Would love to hear other's opinions on this. 

## Todos

✅ 1) Apply this change behind a feature gate as it was indicated at
open-telemetry#27885 (comment)
✅  2) Use an Informer instead of direct API calls.

**Link to tracking Issue:** <Issue number if applicable>
ref:
open-telemetry#27885

**Testing:** <Describe what testing was performed and which tests were
added.>

I experimented with this approach and the results look correct. In order
to verify this I deployed a stress Pod on my machine to consume a target
cpu of 4 cores:

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: cpu-stress
spec:
  containers:
  - name: cpu-stress
    image: polinux/stress
    command: ["stress"]
    args: ["-c", "4"]
```

And then the collected `container.cpu.utilization` for that Pod's
container was at `0,5` as exepcted, based that my machine-node comes
with 8 cores in total:


![cpu-stress](https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/11754898/3abe4a0d-6c99-4b4e-a704-da5789dde01b)

Unit test is also included.

**Documentation:** <Describe the documentation added.>

Added:
https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/32295/files#diff-8ad3b506fb1132c961e8da99b677abd31f0108e3f9ed6999dd96ad3297b51e08

---------

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this pull request Jul 11, 2024
…-telemetry#33390)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
This PR adds the `k8s.pod.cpu.node.utilization` metric.
Follow up from
open-telemetry#32295 (comment)
(cc @TylerHelmuth) .

**Link to tracking Issue:** <Issue number if applicable>
Related to
open-telemetry#27885.

**Testing:** <Describe what testing was performed and which tests were
added.> Adjusted the respective unit test to cover this metric as well.

**Documentation:** <Describe the documentation added.> Added

Tested with a single container Pod:


![podCpu](https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/11754898/9a0069c2-7077-4944-93b6-2dde00979bf3)

---------

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
Co-authored-by: Tiffany Hrabusa <30397949+tiffany76@users.noreply.github.com>
Co-authored-by: Tyler Helmuth <12352919+TylerHelmuth@users.noreply.github.com>
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this pull request Jul 11, 2024
…ion` metrics (open-telemetry#33591)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Similar to
open-telemetry#32295
and
open-telemetry#33390,
this PR adds the `k8s.{container,pod}.memory.node.utilization` metrics.


**Link to tracking Issue:** <Issue number if applicable>
open-telemetry#27885

**Testing:** <Describe what testing was performed and which tests were
added.> Added unit test.

**Documentation:** <Describe the documentation added.> Added

### Manual testing

1. Using the following target Pod:
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: memory-demo
spec:
  containers:
  - name: memory-demo-ctr
    image: polinux/stress
    resources:
      requests:
        memory: "8070591Ki"
      limits:
        memory: "9070591Ki"
    command: ["stress"]
    args: ["--vm", "1", "--vm-bytes", "800M", "--vm-hang", "4"]
```
2. 

![memGood](https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/11754898/fae04b30-59ca-4d70-8446-f54b5a085cf7)

On a node of 32,5G memory the 800Mb container/Pod consumes the
`0.8/32.5=0.0246...=0.025`.

---------

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants