Skip to content

Commit

Permalink
Add GPU note on Efficiency dashboard (#1156)
Browse files Browse the repository at this point in the history
Signed-off-by: chipzoller <chipzoller@gmail.com>
Co-authored-by: Mike Murphy <mike@kubecost>
Co-authored-by: Sean Pomeroy <spom@kubecost.com>
Co-authored-by: chipzoller <chipzoller@gmail.com>
  • Loading branch information
4 people authored Jan 4, 2025
1 parent ce7bdb1 commit b0609a4
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ The report provides three views for examining the efficiency of your infrastruct

Let’s establish some definitions and then explore each of these views.

{% hint style="note" %}
In order for the GPU column to be visible on the Efficiency dashboard, Kubecost must be collecting GPU metrics and observing a container actively using a GPU. See the [NVIDIA GPU Monitoring Configurations](/install-and-configure/advanced-configuration/gpu.md) page for more details on how to set up GPU monitoring for NVIDIA GPUs.
{% endhint %}

## Definitions

- _**Workload Idle**_- Workload Idle is defined as the cost of resources which are requested, but not used, by Kubernetes workloads.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,13 @@ Clicking on each recommendation tile displays a window with further details on t

## Known Limitations

In the first version of the GPU Optimization Savings Insights card there are a few limitations of which to be aware.
In the first version of the GPU Optimization Savings Insights card there are a few known limitations.

- Multiple containers with the same name and running on the same cluster, node, and namespace combination (i.e., "identical" containers) might result in the following effects:
- The savings number provided on Optimize and Remove cards may be an implicit sum of the total cost these containers.
- Recommendations will only be provided for one of them.
- The utilization table may not show these identical containers.
- GPU nodes must be running or have run at least one container utilizing a GPU for it to be represented on the utilization table in either the Cluster aggregation’s GPU nodes column or on the Node aggregation.
- Optimize may be as accurate as possible in certain cases since Kubecost currently infers utilization about all GPUs from a single averaged utilization number.
- The Optimize recommendation may not be as accurate as possible in certain cases since Kubecost currently infers utilization about all GPUs from a single averaged utilization number.
- For upgrades from prior versions to 2.5.0, there may be cases where Max. GPU Utilization could be a smaller percentage than Avg. GPU Utilization. This will self correct once the chosen window size is smaller than the time the 2.5.0 instance has been collecting the new max. GPU util. metric.
- The GPU Optimization card on the Savings Insights screen may initially appear greyed out. Click the meatballs icon in the upper right and choose "Unarchive" to make the card appear as the others.

0 comments on commit b0609a4

Please sign in to comment.