-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: expose prometheus metrics for startup time #4893
Conversation
These are broken down by phase, where the phases correspond to the existing wait points in the code: - Start of doing I/O - When tenant load is done - When initial size calculation is done - When background jobs start - Then "complete" when everything is done.
0941b21
to
6707d58
Compare
1264 tests run: 1214 passed, 0 failed, 50 skipped (full report) |
We should add these metrics to ... list of global metrics somewhere in the python regress suite. Those tests are quite the mess, we assert metrics in two different files and there are easy failures, but a global metric is not bad. I will have to do the same for #4892. Feel free to look into #4813 as well. I'll link this to some observability epic. |
Adding a time to activate as in #4083 was requested might be a good next step on a separate PR. |
chronologically equivalent currently, but more future-proof to put it here.
f83db0a
to
31c0731
Compare
This reverts commit 4798976.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up include more stuff to do on #4813, otherwise this is looking good now. We don't have cancellation safety for these metrics, but I cannot image in which sitation we'd need them, because we are then doing a restart faster than what is the scraping delay.
Co-authored-by: Joonas Koivunen <joonas@neon.tech>
Problem
Currently to know how long pageserver startup took requires inspecting logs.
Summary of changes
pageserver_startup_duration_ms
metric is added, with labelphase
for different phases of startup.These are broken down by phase, where the phases correspond to the existing wait points in the code:
Checklist before requesting a review
Checklist before merging