Prometheus Exporter for Slurm, utilizing the Prometheus Python client library. Here is the sample snapshot of the dashboard.
This repository contains collectors intended to be used with Prometheus to gather and export statistics from Slurm. Each collector focuses on a specific aspect of Slurm. These collectors were written based on the existing FASRC collectors https://github.com/fasrc/prometheus-slurm-exporter
This collector retrieves the current state of all Kempner nodes in the cluster and calculates overall cluster statistics, such as the number of nodes down, nodes in use, and more. Metrics are defined and processed in the Python script slurm_kempner_node_status_collector.py
.
This collector gathers the current showq information for all Kempner partitions. Refer to the Python script slurm_kempner_partitionstats_collector.py
for metric details.
This collector pulls historical usage data using sacct for all Kempner partitions. Metrics are defined in the Python script slurm_kempner_partitionstats_collector.py
.
Sample dashboards for the various collectors can be found in the dash board server.