-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: change query for uptime stat panel #840
Conversation
@mem I changed the query according to your comment's solution. |
src/scenes/Common/uptimeStat.ts
Outdated
sum_over_time( | ||
# the inner query is going to produce a non-zero value if there was at least one successful check during the 5 minute window | ||
# so make it a 1 if there was at least one success and a 0 otherwise | ||
ceil( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this side of the query is missing a sum_over_time
.
The idea is to compute the number of times at least one probe reported success and add that over the entire range (sum_over_range), and that number should be divided by the number of observations made (count_over_time).
I'm sorry, I just realized this is the query we had to fix for a different reason. I think the query itself is correct. We had to add a transformation that collects the data for the entire range because Mimir has a limit on the amount of data that it's willing to retrieve. I was confused because the "explore" link in the panel drops that transformation. Let me take another looks. |
https://github.com/grafana/support-escalations/issues/11197#issuecomment-2218047905 -- update as of 2024/07/09 |
21f45e1
to
6872db2
Compare
I added a new uptime query version from @mem's proposal:
It is under a feature flag named |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉 , thanks @VikaCep!
What this PR does / why we need it:
This PR fixes the query that is used to calculate the uptime stat panel for each check. As described in this escalation, this panel was displaying a value of
100%
even when the probes had 100% errors reaching the target.Which issue(s) this PR fixes:
Part of https://github.com/grafana/support-escalations/issues/11197. This solution addresses the first problem described here https://github.com/grafana/support-escalations/issues/11197#issuecomment-2195657543