Skip to content

Commit

Permalink
docs: backport fix for metrics on blocked_evals vs broker (#15841)
Browse files Browse the repository at this point in the history
In #15835 we renamed the `nomad.broker.total_blocked` metric to
`nomad.broker.total_pending`, but in the process identified that the existing
scheduling performance monitoring guide mixed up the `broker.total_blocked`
metric with the `blocked_evals.total_blocked` metric. This changeset backports
the fix to the docs without renaming the metric (for backwards compatibility).
  • Loading branch information
tgross committed Jan 20, 2023
1 parent 5196040 commit 00173b1
Showing 1 changed file with 8 additions and 4 deletions.
12 changes: 8 additions & 4 deletions website/content/docs/operations/monitoring-nomad.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -187,16 +187,14 @@ points in the scheduling process.
evaluation at a time, entirely in-memory. If this metric increases,
examine the CPU and memory resources of the scheduler.

- **nomad.broker.total_blocked** - The number of blocked
- **nomad.blocked_evals.total_blocked** - The number of blocked
evaluations. Blocked evaluations are created when the scheduler
cannot place all allocations as part of a plan. Blocked evaluations
will be re-evaluated so that changes in cluster resources can be
used for the blocked evaluation's allocations. An increase in
blocked evaluations may mean that the cluster's clients are low in
resources or that job have been submitted that can never have all
their allocations placed. Nomad also emits a similar metric for each
individual scheduler. For example `nomad.broker.batch_blocked` shows
the number of blocked evaluations for the batch scheduler.
their allocations placed.

- **nomad.broker.total_unacked** - The number of unacknowledged
evaluations. When an evaluation has been processed, the worker sends
Expand All @@ -211,6 +209,12 @@ points in the scheduling process.
shows the number of unacknowledged evaluations for the batch
scheduler.

- **nomad.broker.total_blocked** - The number of pending evaluations in the eval
broker. Nomad processes only one evaluation for a given job concurrently. When
an unacked evaluation is acknowledged, Nomad will discard all but the latest
evaluation for a job. An increase in this metric may mean that the cluster
state is changing more rapidly than the schedulers can keep up.

- **nomad.plan.evaluate** - The time to evaluate a scheduler plan
submitted by a worker. This operation happens on the leader to
serialize the plans of all the scheduler workers. This happens
Expand Down

0 comments on commit 00173b1

Please sign in to comment.