Skip to content

Commit

Permalink
Rename salt_state_health to salt_function_health
Browse files Browse the repository at this point in the history
Because it is not only for state.
Note: for a state, there is always a function.
  • Loading branch information
kpetremann committed Apr 17, 2023
1 parent 175a3f2 commit fbc4741
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 20 deletions.
24 changes: 15 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,26 +97,32 @@ salt_new_job_total{function="state.sls",state="test",success="false"} 1
salt_new_job_total{function="state.single",state="test.nop",success="true"} 3
```

### Health Minions metrics
By default, the state.highstate will also generate a health metrics:
### Minions job status

By default, a Salt highstate will generate a status metric:
```
salt_state_health{function="state.highstate",minion="node1",state="highstate"} 1
salt_function_status{function="state.highstate",minion="node1",state="highstate"} 1
```
* `1` mean that the last time this couple of function/state were called, the return was `successful`
* `0` mean that the last time this couple of function/state were called, the return was `failed`
* `1` means that the last time this couple of function/state were executed, the return was `successful`
* `0` means that the last time this couple of function/state were executed, the return was `failed`

You will find a example of prometheus alerts that could be used with these default metrics in the prometheus_alerts directory.
You will find an example of Prometheus alerts that could be used with this metric in the `prometheus_alerts` directory.

The health metrics can be customized by using the -health-functions-filter and -health-states-filter, example of usage:
The health metrics can be customized by using the `-health-functions-filter` and `-health-states-filter`, example of usage:
```
./salt-exporter -health-states-filter=test.ping,state.apply -health-functions-filter=""
```
This will only generate health minion metrics for the test.ping function call:

This will only generate a metric for the `test.ping` function executed:
```
salt_state_health{function="test.ping",minion="node1",state=""} 1
salt_function_status{function="test.ping",minion="node1",state=""} 1
```

You can disable all the health metrics with this config switch:
```./salt-exporter -health-minions=false```

Note: this also works for scheduled jobs.

### `salt/job/<jid>/new`

It increases:
Expand Down
17 changes: 8 additions & 9 deletions internal/metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,13 @@ func ExposeMetrics(ctx context.Context, eventChan <-chan events.SaltEvent, metri
},
[]string{"function", "state", "success"},
)
lastFunctionHealth := promauto.NewGaugeVec(
prometheus.GaugeOpts{
Name: "salt_function_status",
Help: "Last state success function, 0=Failed, 1=Success",
},
[]string{"minion", "function", "state"},
)

scheduledJobReturnCounter := promauto.NewCounterVec(
prometheus.CounterOpts{
Expand All @@ -66,14 +73,6 @@ func ExposeMetrics(ctx context.Context, eventChan <-chan events.SaltEvent, metri
[]string{"function", "state"},
)

lastStateHealth := promauto.NewGaugeVec(
prometheus.GaugeOpts{
Name: "salt_job_health",
Help: "Last state success state, 0=Failed, 1=Success",
},
[]string{"minion", "function", "state"},
)

for {
select {
case <-ctx.Done():
Expand Down Expand Up @@ -117,7 +116,7 @@ func ExposeMetrics(ctx context.Context, eventChan <-chan events.SaltEvent, metri
// Expose state/func status
if metricsConfig.HealthMinions {
if contains(metricsConfig.HealthFunctionsFilters, event.Data.Fun) && contains(metricsConfig.HealthStatesFilters, state) {
lastStateHealth.WithLabelValues(
lastFunctionHealth.WithLabelValues(
event.Data.Id,
event.Data.Fun,
state).Set(boolToFloat64(event.Data.Success))
Expand Down
4 changes: 2 additions & 2 deletions prometheus_alerts/highstate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ groups:
- name: saltstack
rules:
- alert: SaltExporterLastHighstateSuccess
expr: sum by(minion) (salt_state_health{function="state.highstate", state="highstate"} == 0)
expr: sum by(minion) (salt_function_health{function="state.highstate", state="highstate"} == 0)
for: 60m
labels:
severity: critical
Expand All @@ -11,7 +11,7 @@ groups:
summary: "Salt Last Successful Highstate Failed (minion {{ $labels.minion }})"
description: "Salt Last Successful Highstate failed since > 60m"
- alert: SaltExporterLastHighstateSuccessInfo
expr: sum by(minion) (salt_state_health{function="state.highstate", state="highstate"} == 0)
expr: sum by(minion) (salt_function_health{function="state.highstate", state="highstate"} == 0)
for: 10m
labels:
severity: info
Expand Down

0 comments on commit fbc4741

Please sign in to comment.