Skip to content

Commit

Permalink
nomad operator debug changelog / doc updates
Browse files Browse the repository at this point in the history
  • Loading branch information
davemay99 authored and tgross committed Dec 8, 2020
1 parent 1fb1c48 commit bb5b0b8
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 12 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ IMPROVEMENTS:
* api: Added ?task_states=false query parameter to /v1/allocations to remove TaskStates from listings. Defaults to being included as before. [[GH-9055](https://github.com/hashicorp/nomad/issues/9055)]
* build: Updated to Go 1.15.5. [[GH-9345](https://github.com/hashicorp/nomad/issues/9345)]
* cli: Added autocompletion for `recommendation` commands [[GH-9317](https://github.com/hashicorp/nomad/issues/9317)]
* cli: Added client node filtering arguments to `nomad operator debug` command. [[GH-9331](https://github.com/hashicorp/nomad/pull/9331)]
* cli: Added goroutine debug pprof output and server-id=all to `nomad operator debug` capture. [[GH-9067](https://github.com/hashicorp/nomad/pull/9067)]
* cli: Added metrics to `nomad operator debug` capture. [[GH-9034](https://github.com/hashicorp/nomad/pull/9034)]
* cli: Added pprof duration and CSI details to `nomad operator debug` capture. [[GH-9346](https://github.com/hashicorp/nomad/pull/9346)]
* cli: Added `scale` and `scaling-events` subcommands to the `job` command. [[GH-9023](https://github.com/hashicorp/nomad/pull/9023)]
* cli: Added `scaling` command for interaction with the scaling API endpoint. [[GH-9025](https://github.com/hashicorp/nomad/pull/9025)]
* client: Use ec2 CPU perf data from AWS API [[GH-7830](https://github.com/hashicorp/nomad/issues/7830)]
Expand Down
62 changes: 50 additions & 12 deletions website/pages/docs/commands/operator/debug.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@ configured.
If ACLs are enabled, this command will require a token with the 'node:read'
capability to run. In order to collect information, the token will also
require the 'agent:read' and 'operator:read' capabilities, as well as the
'list-jobs' capability for all namespaces.
'list-jobs' capability for all namespaces. To collect pprof profiles the
token will also require 'agent:write', or enable_debug configuration set to
true.

## General Options

Expand All @@ -55,12 +57,24 @@ require the 'agent:read' and 'operator:read' capabilities, as well as the

- `-log-level=DEBUG`: The log level to monitor. Defaults to `DEBUG`.

- `-node-id=n1,n2`: Comma separated list of Nomad client node ids, to
monitor for logs and include pprof data. Accepts id prefixes.
- `-max-nodes=<count>`: Cap the maximum number of client nodes included
in the capture. Defaults to 10, set to 0 for unlimited.

- `-server-id=s1,s2`: Comma separated list of Nomad server names, or
the special server name "leader" to monitor for logs and include
pprof data.
- `-node-class=<node-class>`: Filter client nodes based on node class.

- `-node-id=<node1>,<node2>`: Comma separated list of Nomad client node ids,
to monitor for logs and include pprof profiles. Accepts id prefixes, and
"all" to select all nodes (up to count = max-nodes).

- `pprof-duration=<duration>`: Duration for pprof collection. Defaults to 1s.

- `-server-id=s1,s2`: Comma separated list of Nomad server names, "leader", or
"all" to monitor for logs and include pprof profiles.

- `stale=<true|false>`: If "false", the default, get membership data from the
cluster leader. If the cluster is in an outage unable to establish
leadership, it may be necessary to get the configuration from a non-leader
server.

- `-output=path`: Path to the parent directory of the output
directory. Defaults to the current directory. If specified, no
Expand Down Expand Up @@ -108,18 +122,42 @@ require the 'agent:read' and 'operator:read' capabilities, as well as the

## Output

This command prints the name of the timestamped archive file produced.
This command prints a summary of the capture and the name of the timestamped
archive file produced.

## Examples

```shell-session
$ nomad operator debug -duration 20s -interval 5s -server-id leader -node-id 6e,dd
Starting debugger and capturing cluster data...
Interval: '5s'
Duration: '20s'
$ nomad operator debug -duration 5s -interval 5s -server-id all -node-id b5,20
Starting debugger...
Servers: (3/3) [server1.global server2.global server3.global]
Clients: (2/3) [b547cd3a-085f-68c2-55f4-e99beebb0433 20c0964b-72cc-4083-87fe-ec6905b6230a]
Interval: 5s
Duration: 5s
Capturing cluster data...
Capture interval 0000
Capture interval 0001
Capture interval 0002
Capture interval 0003
Created debug archive: nomad-debug-2020-12-08-034455Z.tar.gz
```

```shell-session
$ nomad operator debug -duration 5s -interval 5s -server-id all -node-id all -max-nodes=1
Starting debugger...
Servers: (3/3) [server1.global server2.global server3.global]
Clients: (1/3) [b547cd3a-085f-68c2-55f4-e99beebb0433]
Max node count reached (1)
Interval: 5s
Duration: 5s
Capturing cluster data...
Capture interval 0000
Capture interval 0001
Capture interval 0002
Capture interval 0003
Created debug archive: nomad-debug-2020-07-20-205223Z.tar.gz
Created debug archive: nomad-debug-2020-12-08-034113Z.tar.gz
```

0 comments on commit bb5b0b8

Please sign in to comment.