Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gpu-exporter and prometheus demo #1087

Merged
merged 1 commit into from
May 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions docs/top/prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,6 @@ $ kubectl apply -f kubernetes-artifacts/prometheus/gpu-exporter.yaml

!!! note

* the prometheus and gpu-exporter components should be deployed in namespace ``kube-system``, and so that ``arena top job <job name>`` can work.

* if the your prometheus has been existed in cluster,please make sure the k8s service whose port is 9090 has the label `kubernetes.io/service-name=prometheus-server`

3\. You can check the GPU metrics by prometheus SQL request
Expand Down
14 changes: 4 additions & 10 deletions kubernetes-artifacts/prometheus/gpu-exporter.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,17 +21,13 @@ spec:
operator: Exists
hostPID: true
volumes:
- hostPath:
path: /var/run/docker.sock
type: FileOrCreate
name: docker-sock
- hostPath:
path: /run/containerd/containerd.sock
type: FileOrCreate
type: Socket
name: containerd-sock
containers:
- name: node-gpu-exporter
image: registry.cn-hangzhou.aliyuncs.com/acs/gpu-prometheus-exporter:0.1-0e21b28
image: registry.cn-hangzhou.aliyuncs.com/acs/gpu-prometheus-exporter:v1.0.1-b2c2f9b
imagePullPolicy: Always
ports:
- containerPort: 9445
Expand All @@ -40,11 +36,9 @@ spec:
memory: 30Mi
cpu: 100m
limits:
memory: 50Mi
cpu: 200m
memory: 2000Mi
cpu: 1000m
volumeMounts:
- mountPath: /var/run/docker.sock
name: docker-sock
- mountPath: /run/containerd/containerd.sock
name: containerd-sock

Expand Down
6 changes: 3 additions & 3 deletions kubernetes-artifacts/prometheus/prometheus.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ data:
storage-retention: 360h
---

apiVersion: rbac.authorization.k8s.io/v1beta1
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
namespace: arena-system
rules:
- apiGroups: ["", "extensions", "apps"]
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
Expand All @@ -32,7 +32,7 @@ metadata:
name: prometheus
namespace: arena-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
Expand Down
Loading