Skip to content

Commit

Permalink
refactor!: rename salt_state_health to salt_function_health
Browse files Browse the repository at this point in the history
Because it is not only for state.
Note: for a state, there is always a function.

BREAKING CHANGE: salt_state_health is replaced by salt_function_health.
Prometheus rules/alerts needs to be adapted.
  • Loading branch information
kpetremann committed Apr 17, 2023
1 parent fbc4741 commit d95da48
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 6 deletions.
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,8 @@ salt_function_responses_total{function="state.highstate",state="highstate",succe
salt_function_responses_total{function="state.sls",state="test",success="true"} 1
salt_function_responses_total{function="state.single",state="test.nop",success="true"} 3
salt_function_status{minion="node1",function="state.highstate",state="highstate"} 1
salt_new_job_total{function="state.apply",state="highstate",success="false"} 1
salt_new_job_total{function="state.highstate",state="highstate",success="false"} 2
salt_new_job_total{function="state.sls",state="test",success="false"} 1
Expand Down Expand Up @@ -133,7 +135,7 @@ It increases:

Usually, it will increase the `salt_responses_total` (per minion) and `salt_function_responses_total` (per function) counters.

However, if it is a feedback of a scheduled job, it increases `salt_scheduled_job_return_total` instead.
However, if it is of a scheduled job feedback, it increases `salt_scheduled_job_return_total` instead.

#### Why separating `salt_responses_total` and `salt_scheduled_job_return_total`

Expand All @@ -154,7 +156,7 @@ It can be joined on function label to have details per executed module.

## Estimated performance

According some simple benchmark, for a simple event, it takes:
According to some simple benchmark, for a simple event, it takes:
* ~60us for parsing
* ~9us for converting to Prometheus metric

Expand All @@ -164,4 +166,4 @@ Roughly, the exporter should be able to handle about 10kQps.

For a base of 1000 Salt minions, it should be able to sustain 10 jobs per minion per second, which is a quite high for Salt.

If needed, the exporter can easily scale more up by doing the parsing in dedicated coroutines, the limiting factor being the Prometheus metric update (~9us).
If needed, the exporter can easily scale more up by doing the parsing in dedicated goroutines, the limiting factor being the Prometheus metric update (~9us).
6 changes: 3 additions & 3 deletions internal/metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,10 @@ func ExposeMetrics(ctx context.Context, eventChan <-chan events.SaltEvent, metri
},
[]string{"function", "state", "success"},
)
lastFunctionHealth := promauto.NewGaugeVec(
lastFunctionStatus := promauto.NewGaugeVec(
prometheus.GaugeOpts{
Name: "salt_function_status",
Help: "Last state success function, 0=Failed, 1=Success",
Help: "Last function/state success, 0=Failed, 1=Success",
},
[]string{"minion", "function", "state"},
)
Expand Down Expand Up @@ -116,7 +116,7 @@ func ExposeMetrics(ctx context.Context, eventChan <-chan events.SaltEvent, metri
// Expose state/func status
if metricsConfig.HealthMinions {
if contains(metricsConfig.HealthFunctionsFilters, event.Data.Fun) && contains(metricsConfig.HealthStatesFilters, state) {
lastFunctionHealth.WithLabelValues(
lastFunctionStatus.WithLabelValues(
event.Data.Id,
event.Data.Fun,
state).Set(boolToFloat64(event.Data.Success))
Expand Down

0 comments on commit d95da48

Please sign in to comment.