Skip to content

Commit

Permalink
cli: add storage engine metrics to debug zip
Browse files Browse the repository at this point in the history
Metrics from the storage engine are already exposed in the
`/debug/lsm` HTTP endpoint. These can be useful when debugging storage
issues, and so this change adds these metrics to the debug zip under
`/nodes/$N/lsm.txt` in the same text format as the HTTP route. The
previously unused `EngineStats` status endpoint was repurposed to
serve these metrics from each node.

Fixes: #79518
Epic: none
Release note: none
  • Loading branch information
anish-shanbhag committed Jun 21, 2024
1 parent a4e20b7 commit 5611494
Show file tree
Hide file tree
Showing 23 changed files with 117 additions and 128 deletions.
11 changes: 5 additions & 6 deletions docs/generated/http/full.md
Original file line number Diff line number Diff line change
Expand Up @@ -1971,23 +1971,22 @@ Support status: [reserved](#support-status)

| Field | Type | Label | Description | Support status |
| ----- | ---- | ----- | ----------- | -------------- |
| stats | [EngineStatsInfo](#cockroach.server.serverpb.EngineStatsResponse-cockroach.server.serverpb.EngineStatsInfo) | repeated | | [reserved](#support-status) |
| stats_by_store_id | [EngineStatsResponse.StatsByStoreIdEntry](#cockroach.server.serverpb.EngineStatsResponse-cockroach.server.serverpb.EngineStatsResponse.StatsByStoreIdEntry) | repeated | maps store IDs to pretty-printed stats about the store's LSM. | [reserved](#support-status) |






<a name="cockroach.server.serverpb.EngineStatsResponse-cockroach.server.serverpb.EngineStatsInfo"></a>
#### EngineStatsInfo
<a name="cockroach.server.serverpb.EngineStatsResponse-cockroach.server.serverpb.EngineStatsResponse.StatsByStoreIdEntry"></a>
#### EngineStatsResponse.StatsByStoreIdEntry



| Field | Type | Label | Description | Support status |
| ----- | ---- | ----- | ----------- | -------------- |
| store_id | [int32](#cockroach.server.serverpb.EngineStatsResponse-int32) | | | [reserved](#support-status) |
| tickers_and_histograms | [cockroach.storage.enginepb.TickersAndHistograms](#cockroach.server.serverpb.EngineStatsResponse-cockroach.storage.enginepb.TickersAndHistograms) | | | [reserved](#support-status) |
| engine_type | [cockroach.storage.enginepb.EngineType](#cockroach.server.serverpb.EngineStatsResponse-cockroach.storage.enginepb.EngineType) | | | [reserved](#support-status) |
| key | [int32](#cockroach.server.serverpb.EngineStatsResponse-int32) | | | |
| value | [string](#cockroach.server.serverpb.EngineStatsResponse-string) | | | |



Expand Down
1 change: 1 addition & 0 deletions pkg/cli/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ go_library(
"//pkg/security/username",
"//pkg/server",
"//pkg/server/authserver",
"//pkg/server/debug",
"//pkg/server/pgurl",
"//pkg/server/profiler",
"//pkg/server/serverctl",
Expand Down
10 changes: 5 additions & 5 deletions pkg/cli/testdata/zip/partial1
Original file line number Diff line number Diff line change
Expand Up @@ -114,10 +114,10 @@ debug zip --concurrency=1 --cpu-profile-duration=0s /dev/null
[node 1] retrieving SQL data for crdb_internal.node_txn_stats... writing output: debug/nodes/1/crdb_internal.node_txn_stats.txt... done
[node 1] requesting data for debug/nodes/1/details... received response... writing JSON output: debug/nodes/1/details.json... done
[node 1] requesting data for debug/nodes/1/gossip... received response... writing JSON output: debug/nodes/1/gossip.json... done
[node 1] requesting data for debug/nodes/1/enginestats... received response... writing JSON output: debug/nodes/1/enginestats.json... done
[node 1] requesting stacks... received response... writing binary output: debug/nodes/1/stacks.txt... done
[node 1] requesting stacks with labels... received response... writing binary output: debug/nodes/1/stacks_with_labels.txt... done
[node 1] requesting heap profile... received response... writing binary output: debug/nodes/1/heap.pprof... done
[node 1] requesting engine stats... received response... writing binary output: debug/nodes/1/lsm.txt... done
[node 1] requesting heap profile list... received response...
[node 1] requesting heap profile list: last request failed: rpc error: ...
[node 1] requesting heap profile list: creating error output: debug/nodes/1/heapprof.err.txt... done
Expand Down Expand Up @@ -208,9 +208,6 @@ debug zip --concurrency=1 --cpu-profile-duration=0s /dev/null
[node 2] requesting data for debug/nodes/2/gossip... received response...
[node 2] requesting data for debug/nodes/2/gossip: last request failed: rpc error: ...
[node 2] requesting data for debug/nodes/2/gossip: creating error output: debug/nodes/2/gossip.json.err.txt... done
[node 2] requesting data for debug/nodes/2/enginestats... received response...
[node 2] requesting data for debug/nodes/2/enginestats: last request failed: rpc error: ...
[node 2] requesting data for debug/nodes/2/enginestats: creating error output: debug/nodes/2/enginestats.json.err.txt... done
[node 2] requesting stacks... received response...
[node 2] requesting stacks: last request failed: rpc error: ...
[node 2] requesting stacks: creating error output: debug/nodes/2/stacks.txt.err.txt... done
Expand All @@ -220,6 +217,9 @@ debug zip --concurrency=1 --cpu-profile-duration=0s /dev/null
[node 2] requesting heap profile... received response...
[node 2] requesting heap profile: last request failed: rpc error: ...
[node 2] requesting heap profile: creating error output: debug/nodes/2/heap.pprof.err.txt... done
[node 2] requesting engine stats... received response...
[node 2] requesting engine stats: last request failed: rpc error: ...
[node 2] requesting engine stats: creating error output: debug/nodes/2/lsm.txt.err.txt... done
[node 2] requesting heap profile list... received response...
[node 2] requesting heap profile list: last request failed: rpc error: ...
[node 2] requesting heap profile list: creating error output: debug/nodes/2/heapprof.err.txt... done
Expand Down Expand Up @@ -262,10 +262,10 @@ debug zip --concurrency=1 --cpu-profile-duration=0s /dev/null
[node 3] retrieving SQL data for crdb_internal.node_txn_stats... writing output: debug/nodes/3/crdb_internal.node_txn_stats.txt... done
[node 3] requesting data for debug/nodes/3/details... received response... writing JSON output: debug/nodes/3/details.json... done
[node 3] requesting data for debug/nodes/3/gossip... received response... writing JSON output: debug/nodes/3/gossip.json... done
[node 3] requesting data for debug/nodes/3/enginestats... received response... writing JSON output: debug/nodes/3/enginestats.json... done
[node 3] requesting stacks... received response... writing binary output: debug/nodes/3/stacks.txt... done
[node 3] requesting stacks with labels... received response... writing binary output: debug/nodes/3/stacks_with_labels.txt... done
[node 3] requesting heap profile... received response... writing binary output: debug/nodes/3/heap.pprof... done
[node 3] requesting engine stats... received response... writing binary output: debug/nodes/3/lsm.txt... done
[node 3] requesting heap profile list... received response...
[node 3] requesting heap profile list: last request failed: rpc error: ...
[node 3] requesting heap profile list: creating error output: debug/nodes/3/heapprof.err.txt... done
Expand Down
4 changes: 2 additions & 2 deletions pkg/cli/testdata/zip/partial1_excluded
Original file line number Diff line number Diff line change
Expand Up @@ -114,10 +114,10 @@ debug zip /dev/null --concurrency=1 --exclude-nodes=2 --cpu-profile-duration=0
[node 1] retrieving SQL data for crdb_internal.node_txn_stats... writing output: debug/nodes/1/crdb_internal.node_txn_stats.txt... done
[node 1] requesting data for debug/nodes/1/details... received response... writing JSON output: debug/nodes/1/details.json... done
[node 1] requesting data for debug/nodes/1/gossip... received response... writing JSON output: debug/nodes/1/gossip.json... done
[node 1] requesting data for debug/nodes/1/enginestats... received response... writing JSON output: debug/nodes/1/enginestats.json... done
[node 1] requesting stacks... received response... writing binary output: debug/nodes/1/stacks.txt... done
[node 1] requesting stacks with labels... received response... writing binary output: debug/nodes/1/stacks_with_labels.txt... done
[node 1] requesting heap profile... received response... writing binary output: debug/nodes/1/heap.pprof... done
[node 1] requesting engine stats... received response... writing binary output: debug/nodes/1/lsm.txt... done
[node 1] requesting heap profile list... received response...
[node 1] requesting heap profile list: last request failed: rpc error: ...
[node 1] requesting heap profile list: creating error output: debug/nodes/1/heapprof.err.txt... done
Expand Down Expand Up @@ -159,10 +159,10 @@ debug zip /dev/null --concurrency=1 --exclude-nodes=2 --cpu-profile-duration=0
[node 3] retrieving SQL data for crdb_internal.node_txn_stats... writing output: debug/nodes/3/crdb_internal.node_txn_stats.txt... done
[node 3] requesting data for debug/nodes/3/details... received response... writing JSON output: debug/nodes/3/details.json... done
[node 3] requesting data for debug/nodes/3/gossip... received response... writing JSON output: debug/nodes/3/gossip.json... done
[node 3] requesting data for debug/nodes/3/enginestats... received response... writing JSON output: debug/nodes/3/enginestats.json... done
[node 3] requesting stacks... received response... writing binary output: debug/nodes/3/stacks.txt... done
[node 3] requesting stacks with labels... received response... writing binary output: debug/nodes/3/stacks_with_labels.txt... done
[node 3] requesting heap profile... received response... writing binary output: debug/nodes/3/heap.pprof... done
[node 3] requesting engine stats... received response... writing binary output: debug/nodes/3/lsm.txt... done
[node 3] requesting heap profile list... received response...
[node 3] requesting heap profile list: last request failed: rpc error: ...
[node 3] requesting heap profile list: creating error output: debug/nodes/3/heapprof.err.txt... done
Expand Down
4 changes: 2 additions & 2 deletions pkg/cli/testdata/zip/partial2
Original file line number Diff line number Diff line change
Expand Up @@ -114,10 +114,10 @@ debug zip --concurrency=1 --cpu-profile-duration=0 /dev/null
[node 1] retrieving SQL data for crdb_internal.node_txn_stats... writing output: debug/nodes/1/crdb_internal.node_txn_stats.txt... done
[node 1] requesting data for debug/nodes/1/details... received response... writing JSON output: debug/nodes/1/details.json... done
[node 1] requesting data for debug/nodes/1/gossip... received response... writing JSON output: debug/nodes/1/gossip.json... done
[node 1] requesting data for debug/nodes/1/enginestats... received response... writing JSON output: debug/nodes/1/enginestats.json... done
[node 1] requesting stacks... received response... writing binary output: debug/nodes/1/stacks.txt... done
[node 1] requesting stacks with labels... received response... writing binary output: debug/nodes/1/stacks_with_labels.txt... done
[node 1] requesting heap profile... received response... writing binary output: debug/nodes/1/heap.pprof... done
[node 1] requesting engine stats... received response... writing binary output: debug/nodes/1/lsm.txt... done
[node 1] requesting heap profile list... received response...
[node 1] requesting heap profile list: last request failed: rpc error: ...
[node 1] requesting heap profile list: creating error output: debug/nodes/1/heapprof.err.txt... done
Expand Down Expand Up @@ -158,10 +158,10 @@ debug zip --concurrency=1 --cpu-profile-duration=0 /dev/null
[node 3] retrieving SQL data for crdb_internal.node_txn_stats... writing output: debug/nodes/3/crdb_internal.node_txn_stats.txt... done
[node 3] requesting data for debug/nodes/3/details... received response... writing JSON output: debug/nodes/3/details.json... done
[node 3] requesting data for debug/nodes/3/gossip... received response... writing JSON output: debug/nodes/3/gossip.json... done
[node 3] requesting data for debug/nodes/3/enginestats... received response... writing JSON output: debug/nodes/3/enginestats.json... done
[node 3] requesting stacks... received response... writing binary output: debug/nodes/3/stacks.txt... done
[node 3] requesting stacks with labels... received response... writing binary output: debug/nodes/3/stacks_with_labels.txt... done
[node 3] requesting heap profile... received response... writing binary output: debug/nodes/3/heap.pprof... done
[node 3] requesting engine stats... received response... writing binary output: debug/nodes/3/lsm.txt... done
[node 3] requesting heap profile list... received response...
[node 3] requesting heap profile list: last request failed: rpc error: ...
[node 3] requesting heap profile list: creating error output: debug/nodes/3/heapprof.err.txt... done
Expand Down
2 changes: 1 addition & 1 deletion pkg/cli/testdata/zip/testzip
Original file line number Diff line number Diff line change
Expand Up @@ -117,10 +117,10 @@ debug zip --concurrency=1 --cpu-profile-duration=1s /dev/null
[node 1] retrieving SQL data for crdb_internal.node_txn_stats... writing output: debug/nodes/1/crdb_internal.node_txn_stats.txt... done
[node 1] requesting data for debug/nodes/1/details... received response... writing JSON output: debug/nodes/1/details.json... done
[node 1] requesting data for debug/nodes/1/gossip... received response... writing JSON output: debug/nodes/1/gossip.json... done
[node 1] requesting data for debug/nodes/1/enginestats... received response... writing JSON output: debug/nodes/1/enginestats.json... done
[node 1] requesting stacks... received response... writing binary output: debug/nodes/1/stacks.txt... done
[node 1] requesting stacks with labels... received response... writing binary output: debug/nodes/1/stacks_with_labels.txt... done
[node 1] requesting heap profile... received response... writing binary output: debug/nodes/1/heap.pprof... done
[node 1] requesting engine stats... received response... writing binary output: debug/nodes/1/lsm.txt... done
[node 1] requesting heap profile list... received response... done
[node ?] ? heap profiles found
[node 1] requesting goroutine dump list... received response... done
Expand Down
24 changes: 12 additions & 12 deletions pkg/cli/testdata/zip/testzip_concurrent
Original file line number Diff line number Diff line change
Expand Up @@ -278,14 +278,14 @@ zip
[node 1] requesting data for debug/nodes/1/details: done
[node 1] requesting data for debug/nodes/1/details: received response...
[node 1] requesting data for debug/nodes/1/details: writing JSON output: debug/nodes/1/details.json...
[node 1] requesting data for debug/nodes/1/enginestats...
[node 1] requesting data for debug/nodes/1/enginestats: done
[node 1] requesting data for debug/nodes/1/enginestats: received response...
[node 1] requesting data for debug/nodes/1/enginestats: writing JSON output: debug/nodes/1/enginestats.json...
[node 1] requesting data for debug/nodes/1/gossip...
[node 1] requesting data for debug/nodes/1/gossip: done
[node 1] requesting data for debug/nodes/1/gossip: received response...
[node 1] requesting data for debug/nodes/1/gossip: writing JSON output: debug/nodes/1/gossip.json...
[node 1] requesting engine stats...
[node 1] requesting engine stats: done
[node 1] requesting engine stats: received response...
[node 1] requesting engine stats: writing binary output: debug/nodes/1/lsm.txt...
[node 1] requesting goroutine dump list...
[node 1] requesting goroutine dump list: creating error output: debug/nodes/1/goroutines.err.txt...
[node 1] requesting goroutine dump list: done
Expand Down Expand Up @@ -399,14 +399,14 @@ zip
[node 2] requesting data for debug/nodes/2/details: done
[node 2] requesting data for debug/nodes/2/details: received response...
[node 2] requesting data for debug/nodes/2/details: writing JSON output: debug/nodes/2/details.json...
[node 2] requesting data for debug/nodes/2/enginestats...
[node 2] requesting data for debug/nodes/2/enginestats: done
[node 2] requesting data for debug/nodes/2/enginestats: received response...
[node 2] requesting data for debug/nodes/2/enginestats: writing JSON output: debug/nodes/2/enginestats.json...
[node 2] requesting data for debug/nodes/2/gossip...
[node 2] requesting data for debug/nodes/2/gossip: done
[node 2] requesting data for debug/nodes/2/gossip: received response...
[node 2] requesting data for debug/nodes/2/gossip: writing JSON output: debug/nodes/2/gossip.json...
[node 2] requesting engine stats...
[node 2] requesting engine stats: done
[node 2] requesting engine stats: received response...
[node 2] requesting engine stats: writing binary output: debug/nodes/2/lsm.txt...
[node 2] requesting goroutine dump list...
[node 2] requesting goroutine dump list: creating error output: debug/nodes/2/goroutines.err.txt...
[node 2] requesting goroutine dump list: done
Expand Down Expand Up @@ -520,14 +520,14 @@ zip
[node 3] requesting data for debug/nodes/3/details: done
[node 3] requesting data for debug/nodes/3/details: received response...
[node 3] requesting data for debug/nodes/3/details: writing JSON output: debug/nodes/3/details.json...
[node 3] requesting data for debug/nodes/3/enginestats...
[node 3] requesting data for debug/nodes/3/enginestats: done
[node 3] requesting data for debug/nodes/3/enginestats: received response...
[node 3] requesting data for debug/nodes/3/enginestats: writing JSON output: debug/nodes/3/enginestats.json...
[node 3] requesting data for debug/nodes/3/gossip...
[node 3] requesting data for debug/nodes/3/gossip: done
[node 3] requesting data for debug/nodes/3/gossip: received response...
[node 3] requesting data for debug/nodes/3/gossip: writing JSON output: debug/nodes/3/gossip.json...
[node 3] requesting engine stats...
[node 3] requesting engine stats: done
[node 3] requesting engine stats: received response...
[node 3] requesting engine stats: writing binary output: debug/nodes/3/lsm.txt...
[node 3] requesting goroutine dump list...
[node 3] requesting goroutine dump list: creating error output: debug/nodes/3/goroutines.err.txt...
[node 3] requesting goroutine dump list: done
Expand Down
2 changes: 1 addition & 1 deletion pkg/cli/testdata/zip/testzip_exclude_goroutine_stacks
Original file line number Diff line number Diff line change
Expand Up @@ -117,9 +117,9 @@ debug zip --concurrency=1 --cpu-profile-duration=1s --include-goroutine-stacks=f
[node 1] retrieving SQL data for crdb_internal.node_txn_stats... writing output: debug/nodes/1/crdb_internal.node_txn_stats.txt... done
[node 1] requesting data for debug/nodes/1/details... received response... writing JSON output: debug/nodes/1/details.json... done
[node 1] requesting data for debug/nodes/1/gossip... received response... writing JSON output: debug/nodes/1/gossip.json... done
[node 1] requesting data for debug/nodes/1/enginestats... received response... writing JSON output: debug/nodes/1/enginestats.json... done
[node 1] Skipping fetching goroutine stacks. Enable via the --include-goroutine-stacks flag.
[node 1] requesting heap profile... received response... writing binary output: debug/nodes/1/heap.pprof... done
[node 1] requesting engine stats... received response... writing binary output: debug/nodes/1/lsm.txt... done
[node 1] requesting heap profile list... received response... done
[node ?] ? heap profiles found
[node 1] requesting goroutine dump list... received response... done
Expand Down
2 changes: 1 addition & 1 deletion pkg/cli/testdata/zip/testzip_exclude_range_info
Original file line number Diff line number Diff line change
Expand Up @@ -113,10 +113,10 @@ debug zip --concurrency=1 --cpu-profile-duration=1s --include-range-info=false /
[node 1] retrieving SQL data for crdb_internal.node_txn_stats... writing output: debug/nodes/1/crdb_internal.node_txn_stats.txt... done
[node 1] requesting data for debug/nodes/1/details... received response... writing JSON output: debug/nodes/1/details.json... done
[node 1] requesting data for debug/nodes/1/gossip... received response... writing JSON output: debug/nodes/1/gossip.json... done
[node 1] requesting data for debug/nodes/1/enginestats... received response... writing JSON output: debug/nodes/1/enginestats.json... done
[node 1] requesting stacks... received response... writing binary output: debug/nodes/1/stacks.txt... done
[node 1] requesting stacks with labels... received response... writing binary output: debug/nodes/1/stacks_with_labels.txt... done
[node 1] requesting heap profile... received response... writing binary output: debug/nodes/1/heap.pprof... done
[node 1] requesting engine stats... received response... writing binary output: debug/nodes/1/lsm.txt... done
[node 1] requesting heap profile list... received response... done
[node ?] ? heap profiles found
[node 1] requesting goroutine dump list... received response... done
Expand Down
Loading

0 comments on commit 5611494

Please sign in to comment.