-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics for rejected queries in QFE #5356
Add metrics for rejected queries in QFE #5356
Conversation
pkg/frontend/transport/handler.go
Outdated
@@ -303,6 +350,38 @@ func (f *Handler) reportQueryStats(r *http.Request, queryString url.Values, quer | |||
} else { | |||
level.Info(util_log.WithContext(r.Context(), f.log)).Log(logMessage...) | |||
} | |||
|
|||
var reason string | |||
// 413, 422 or 429. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think we need this comment :-) It's just redundant with the if statements.
pkg/frontend/transport/handler.go
Outdated
) | ||
|
||
var ( | ||
LimitTooManySamples = `query processing would load too many samples into memory` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have to be exported? I only see this being used in the transport
package.
@@ -6,6 +6,7 @@ | |||
* [CHANGE] Ingester: Creating label `native-histogram-sample` on the `cortex_discarded_samples_total` to keep track of discarded native histogram samples. #5289 | |||
* [FEATURE] Store Gateway: Add `max_downloaded_bytes_per_request` to limit max bytes to download per store gateway request. | |||
* [FEATURE] Added 2 flags `-alertmanager.alertmanager-client.grpc-max-send-msg-size` and ` -alertmanager.alertmanager-client.grpc-max-recv-msg-size` to configure alert manager grpc client message size limits. #5338 | |||
* [FEATURE] Query Frontend: Add `cortex_discarded_queries_total` metric for throttled queries. #5356 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think cortex_failed_queries_total
is a better name than cortex_discarded_queries_total
based what we are doing in this PR. It doesn't make sense that a query can be "discarded".
using "failed" has the benefit that we can use the same metric name for 5xx later on -- just have different value for reason
.
I would recommend to change this entry to
[FEATURE] Query Frontend: Add
cortex_failed_queries_totalmetric for failed queries with
reason label. #5356
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree cortex_discarded_queries_total
is not a good name here. What I want to track is probably cortex_throttled_queries_total
.
cortex_failed_queries_total
is too general and I don't really want to track 5xx with this metric, 5xx can be of difference reasons and I feel it is more useful to track throttled queries for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yea cortex_throttled_queries_total
make more sense. But "throttled" in this context is still weird because it's supposed to mean "flow control", but what we are doing here is not flow control when chunk is too big.
How about cortex_rejected_queries_total
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the name of cortex_rejected_queries_total
!
pkg/frontend/transport/handler.go
Outdated
reasonTooManyRequests = "too_many_requests" | ||
reasonTooLongRange = "too_long_range" | ||
reasonTooManySamples = "too_many_samples" | ||
reasonSeriesFetched = "series_fetched" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think it would be more consistent if we use too_many_series_fetched
. I am thinking about cortex_discarded_queries_total{reason="series_fetched"}
v.s. cortex_discarded_queries_total{reason="too_many_series_fetched"}
; I think later one is more clear to reader.
However, the counter argument is that the label name become bigger and may have performance impact. In which case maybe we should change too_many_samples
to samples
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too_many_
is kind of redundant I feel as we usually don't limit too_few
. I will try to rename a few but I have to keep too_many_requests
and too_many_samples
. It is not easy to find a better name
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
131cee9
to
bac7b1f
Compare
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🍺 🍻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🍺 🍻
What this PR does:
Add metric
cortex_rejected_queries_total
in QFE.Which issue(s) this PR fixes:
Fixes #5355
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]