GRPC metrics middleware implemented as a Tower Layer #3931
Labels
A-telemetry
Area: Metrics, logging, and other observability-related features
_P-V2
Priority: after mainnet
Is your feature request related to a problem? Please describe.
Recently, the public RPC fell into a degraded state as someone made too many requests to it. In this case, we identified the cause via an accidental backchannel. However, had that not happened we would have been totally unable to determine what the cause was, what kind of requests we were getting, and what was happening to them, because we only have metrics on the requests we already had a reason to care about.
Instead, we need to have generic GRPC metrics that work with any GRPC method.
Describe the solution you'd like
Scope and implement a tower Layer that we can apply to our GRPC services. The trace middleware is probably a good reference implementation to study.
We should identify a few relevant metrics and then emit them with the GRPC method name as a metrics key. That would allow us to take cross-sections of each metric by rpc method and identify performance culprits.
Suggestions to get started:
Ideas for later:
We should implement this specifically as a Tower Layer rather than spending time adding additional specific metrics; it will be a bit more work upfront but will have much better results long term.
The text was updated successfully, but these errors were encountered: