-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: add storage engine metrics to debug zip #125865
Conversation
fccece2
to
de5b22f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Broadly I like the approach here
pkg/server/debug/server.go
Outdated
@@ -221,28 +220,33 @@ func (ds *Server) RegisterWorkloadCollector(stores *kvserver.Stores) error { | |||
return nil | |||
} | |||
|
|||
// GetEngineMetrics generates a string combining debug metrics for all of the provided storage engines. | |||
func GetEngineMetrics(engines []storage.Engine) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe we should call this GetLSMStats
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this is fixed
pkg/server/serverpb/status.proto
Outdated
message EngineStatsRequest { | ||
// node_id is a string so that "local" can be used to specify that no | ||
// forwarding is necessary. | ||
string node_id = 1; | ||
} | ||
|
||
message EngineStatsResponse { | ||
repeated EngineStatsInfo stats = 1 [ (gogoproto.nullable) = false ]; | ||
// contains LSM metrics for all stores on the requested node | ||
string metrics = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're changing the protobuf anyway, I'd have a slight preference for repeated string metrics [(gogoproto.nullable=false)]
, that way we have more infrastructure ready for per-store responses if we ever move in that direction. You can move the concatenation logic into a helper that can then be used by debug.GetEngineStats
.
Also convention with changing protobufs is that the new field doens't use the same argument ID (the = 1
at the end) as the old field. You might need to make this 2, and you might also need a reserved 1
above it for the missing 1, but maybe that last part isn't necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for calling out the argument ID convention - I've updated to use = 2
.
I also opted to use a store ID -> string map rather than a list of strings so that we can filter by the store ID if needed in the future.
I did some more investigation into whether we can just make a request to Since the I also looked into whether we can just include these LSM stats as a field inside the |
@anish-shanbhag Thanks for looking into the debug endpoints! Given all of what you pointed out, especially the ability to dial other nodes and the ability to call the status server over grpc in addition to http, it makes sense to just use EngineStats for this. I'm broadly in support of the approach you have here, and we can scope this change to just 1) updating EngineStats to return the LSM stats for each store, 2) pulling EngineStats and adding them to the debug zip. Once this change is merged, we should update #100825 to reflect the update to EngineStats, but otherwise leave that as a standalone issue. |
de5b22f
to
5611494
Compare
Metrics from the storage engine are already exposed in the `/debug/lsm` HTTP endpoint. These can be useful when debugging storage issues, and so this change adds these metrics to the debug zip under `/nodes/$N/lsm.txt` in the same text format as the HTTP route. The previously unused `EngineStats` status endpoint was repurposed to serve these metrics from each node. Fixes: cockroachdb#79518 Epic: none Release note: none
5611494
to
cb2133f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work!
TFTR! bors r=itsbilal |
Metrics from the storage engine are already exposed in the
/debug/lsm
HTTP endpoint. These can be useful when debugging storage issues, and so this change adds these metrics to the debug zip under/nodes/$N/lsm.txt
in the same text format as the HTTP route. The previously unusedEngineStats
status endpoint was repurposed to serve these metrics from each node.Fixes: #79518
Epic: none
Release note: none