Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add response time metrics (keep track when the page timeouts) #45

Open
MadLittleMods opened this issue Jul 29, 2022 · 0 comments
Open
Labels
A-metrics stats, metrics, dashboards A-tracing OpenTelemetry tracing (spans, timing, observability) T-Enhancement New feature or request

Comments

@MadLittleMods
Copy link
Contributor

MadLittleMods commented Jul 29, 2022

Add metric when the page times out. Record the Matrix API that is still running and the duration.

Things to record in each event:

  • Response status code we ended up sending
  • Total time spent on the server rendering the request (this will just end up being the timeout configured)
  • Homeserver
  • Room ID
    • Since this has a very high cardinality (lots of possible values), we might not be able to index this but would be good to have on each metric event to inspect.
    • These extra details are nice if we want to investigate why a particular room/homeserver combo is timing out
  • Matrix API endpoint path that is still running when we timed out (like /join, /messages)
    • Is this useful? Would be nice to know where most requests get stuck at

We can also send a success metric and response time to compare against how many requests we're failing to serve vs total traffic.

Dev notes

We probably just need to add something like prom-client, expose a Prometheus /metrics scrape endpoint that serves await register.metrics(), then add a scrape annotation to the K8s service (which is still being finalized)

Adjacent: Here is an example middleware from the Gitter webapp that logs and metrics when a request is pending for more than 60 seconds, https://gitlab.com/gitterHQ/webapp/-/blob/676fadc3693260c8c51f448a0ca4c3e180d1b4a2/server/web/middlewares/pending-request.js#L50-84

@MadLittleMods MadLittleMods added T-Enhancement New feature or request A-tracing OpenTelemetry tracing (spans, timing, observability) labels Jul 29, 2022
@MadLittleMods MadLittleMods added the A-metrics stats, metrics, dashboards label Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-metrics stats, metrics, dashboards A-tracing OpenTelemetry tracing (spans, timing, observability) T-Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant