-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vgpu-monitor-metrics does not show in grafana #410
Comments
You need to create a Prometheus |
hi @chaunceyjiang , Now I have started the Prometheus server and got the dcgm-exporter metrics、the vgpu exporter metrics. As this grafana guide shows: I think what should I do is only to import the given grafana-json and select the existed Prometheus source, that is right? |
You need to create a Prometheus ServiceMonitor.
Because the dcgm-exporter includes a ServiceMonitor. |
hi @chaunceyjiang ,
here is servicemonitor.yaml
verify:
now, grafana shows only one vgpu panel, others no data still |
If the |
What is the value set for your 'nvidia. com/gpucores'? |
I do not set 'nvidia. com/gpucores', only set memory |
Could you try setting a value for |
hi @chaunceyjiang
this pod requests 5 gpucores,but the monitor shows it used more than 5: in pod it shows more than vgpuMonitor so is this a bug or something else? thanks |
I get that the gpu util is not the core percent,so is there some metrics could show that the pod dose use the fix percent core? |
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
1. Issue or feature description
the vgpu monitor metrics does not show in grafana,dcgm-exporter is ok
Prometheus scrape metrics like bellow:

I create a vgpu pod bellow:

the grafana bellow:


2. Steps to reproduce the issue
3. Information to attach (optional if deemed irrelevant)
there are something should update in readme
1、the export-name for Prometheus configuration in current hami is
hami-device-plugin-monitor
, notvgpu-device-plugin-monitor
2、should add ServeAccount part for monitor
3、the grafana-json should update to enable select Prometheus source
Common error checking:
nvidia-smi -a
on your host/etc/docker/daemon.json
)sudo journalctl -r -u kubelet
)Additional information that might help better understand your environment and reproduce the bug:
docker version
uname -a
dmesg
The text was updated successfully, but these errors were encountered: