Skip to content

Commit

Permalink
docs: add nomad.plan.node_rejected metric (#11860)
Browse files Browse the repository at this point in the history
  • Loading branch information
lgfa29 committed Jan 18, 2022
1 parent 6134014 commit a0c0b80
Showing 1 changed file with 25 additions and 22 deletions.
47 changes: 25 additions & 22 deletions website/content/docs/operations/metrics-reference.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -92,28 +92,29 @@ table below is defined to be their flush interval. Otherwise, the interval can
be assumed to be 10 seconds when retrieving metrics using the above described
signals.

| Metrics | Description | Unit | Type |
| -------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------ | ------- |
| `nomad.runtime.alloc_bytes` | Memory utilization | # of bytes | Gauge |
| `nomad.runtime.heap_objects` | Number of objects on the heap. General memory pressure indicator | # of heap objects | Gauge |
| `nomad.runtime.num_goroutines` | Number of goroutines and general load pressure indicator | # of goroutines | Gauge |
| `nomad.nomad.broker.total_blocked` | Evaluations that are blocked until an existing evaluation for the same job completes | # of evaluations | Gauge |
| `nomad.nomad.broker.total_ready` | Number of evaluations ready to be processed | # of evaluations | Gauge |
| `nomad.nomad.broker.total_unacked` | Evaluations dispatched for processing but incomplete | # of evaluations | Gauge |
| `nomad.nomad.heartbeat.active` | Number of active heartbeat timers. Each timer represents a Nomad Client connection | # of heartbeat timers | Gauge |
| `nomad.nomad.heartbeat.invalidate` | The length of time it takes to invalidate a Nomad Client due to failed heartbeats | ms / Heartbeat Invalidation | Timer |
| `nomad.nomad.plan.evaluate` | Time to validate a scheduler Plan. Higher values cause lower scheduling throughput. Similar to `nomad.plan.submit` but does not include RPC time or time in the Plan Queue | ms / Plan Evaluation | Timer |
| `nomad.nomad.plan.queue_depth` | Number of scheduler Plans waiting to be evaluated | # of plans | Gauge |
| `nomad.nomad.plan.submit` | Time to submit a scheduler Plan. Higher values cause lower scheduling throughput | ms / Plan Submit | Timer |
| `nomad.nomad.rpc.query` | Number of RPC queries | RPC Queries / `interval` | Counter |
| `nomad.nomad.rpc.request_error` | Number of RPC requests being handled that result in an error | RPC Errors / `interval` | Counter |
| `nomad.nomad.rpc.request` | Number of RPC requests being handled | RPC Requests / `interval` | Counter |
| `nomad.nomad.worker.invoke_scheduler.<type>` | Time to run the scheduler of the given type | ms / Scheduler Run | Timer |
| `nomad.nomad.worker.wait_for_index` | Time waiting for Raft log replication from leader. High delays result in lower scheduling throughput | ms / Raft Index Wait | Timer |
| `nomad.raft.apply` | Number of Raft transactions | Raft transactions / `interval` | Counter |
| `nomad.raft.leader.lastContact` | Time since last contact to leader. General indicator of Raft latency | ms / Leader Contact | Timer |
| `nomad.raft.replication.appendEntries` | Raft transaction commit time | ms / Raft Log Append | Timer |
| `nomad.license.expiration_time_epoch` | Time as epoch (seconds since Jan 1 1970) at which license will expire | Seconds | Gauge |
| Metrics | Description | Unit | Type |
| -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ------------------------------ | ------- |
| `nomad.runtime.alloc_bytes` | Memory utilization | # of bytes | Gauge |
| `nomad.runtime.heap_objects` | Number of objects on the heap. General memory pressure indicator | # of heap objects | Gauge |
| `nomad.runtime.num_goroutines` | Number of goroutines and general load pressure indicator | # of goroutines | Gauge |
| `nomad.nomad.broker.total_blocked` | Evaluations that are blocked until an existing evaluation for the same job completes | # of evaluations | Gauge |
| `nomad.nomad.broker.total_ready` | Number of evaluations ready to be processed | # of evaluations | Gauge |
| `nomad.nomad.broker.total_unacked` | Evaluations dispatched for processing but incomplete | # of evaluations | Gauge |
| `nomad.nomad.heartbeat.active` | Number of active heartbeat timers. Each timer represents a Nomad Client connection | # of heartbeat timers | Gauge |
| `nomad.nomad.heartbeat.invalidate` | The length of time it takes to invalidate a Nomad Client due to failed heartbeats | ms / Heartbeat Invalidation | Timer |
| `nomad.nomad.plan.evaluate` | Time to validate a scheduler Plan. Higher values cause lower scheduling throughput. Similar to `nomad.plan.submit` but does not include RPC time or time in the Plan Queue | ms / Plan Evaluation | Timer |
| `nomad.nomad.plan.node_rejected` | Number of times a node has had a plan rejected. A node with a high rate of rejections may have an underlying issue causing it to be unschedulable. Refer to [this link][s_port_plan_failure] for more information | # of rejected plans | Counter |
| `nomad.nomad.plan.queue_depth` | Number of scheduler Plans waiting to be evaluated | # of plans | Gauge |
| `nomad.nomad.plan.submit` | Time to submit a scheduler Plan. Higher values cause lower scheduling throughput | ms / Plan Submit | Timer |
| `nomad.nomad.rpc.query` | Number of RPC queries | RPC Queries / `interval` | Counter |
| `nomad.nomad.rpc.request_error` | Number of RPC requests being handled that result in an error | RPC Errors / `interval` | Counter |
| `nomad.nomad.rpc.request` | Number of RPC requests being handled | RPC Requests / `interval` | Counter |
| `nomad.nomad.worker.invoke_scheduler.<type>` | Time to run the scheduler of the given type | ms / Scheduler Run | Timer |
| `nomad.nomad.worker.wait_for_index` | Time waiting for Raft log replication from leader. High delays result in lower scheduling throughput | ms / Raft Index Wait | Timer |
| `nomad.raft.apply` | Number of Raft transactions | Raft transactions / `interval` | Counter |
| `nomad.raft.leader.lastContact` | Time since last contact to leader. General indicator of Raft latency | ms / Leader Contact | Timer |
| `nomad.raft.replication.appendEntries` | Raft transaction commit time | ms / Raft Log Append | Timer |
| `nomad.license.expiration_time_epoch` | Time as epoch (seconds since Jan 1 1970) at which license will expire | Seconds | Gauge |

## Client Metrics

Expand Down Expand Up @@ -389,6 +390,7 @@ those listed in [Key Metrics](#key-metrics) above.
| `nomad.nomad.periodic.force` | Time elapsed for `Periodic.Force` RPC call | Nanoseconds | Summary | host |
| `nomad.nomad.plan.apply` | Time elapsed to apply a plan | Nanoseconds | Summary | host |
| `nomad.nomad.plan.evaluate` | Time elapsed to evaluate a plan | Nanoseconds | Summary | host |
| `nomad.nomad.plan.node_rejected` | Number of times a node has had a plan rejected | Integer | Counter | host, node_id |
| `nomad.nomad.plan.queue_depth` | Count of evals in the plan queue | Integer | Gauge | host |
| `nomad.nomad.plan.submit` | Time elapsed for `Plan.Submit` RPC call | Nanoseconds | Summary | host |
| `nomad.nomad.plan.wait_for_index` | Time elapsed that planner waits for the raft index of the plan to be processed | Nanoseconds | Summary | host |
Expand Down Expand Up @@ -449,3 +451,4 @@ those listed in [Key Metrics](#key-metrics) above.


[tagged-metrics]: /docs/telemetry/metrics#tagged-metrics
[s_port_plan_failure]: /s/port-plan-failure

0 comments on commit a0c0b80

Please sign in to comment.