Add metrics for rejected queries in QFE #5356

yeya24 · 2023-05-23T02:23:41Z

What this PR does:

Add metric cortex_rejected_queries_total in QFE.

Which issue(s) this PR fixes:
Fixes #5355

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

alvinlin123 · 2023-05-24T00:13:48Z

pkg/frontend/transport/handler.go

@@ -303,6 +350,38 @@ func (f *Handler) reportQueryStats(r *http.Request, queryString url.Values, quer
 	} else {
 		level.Info(util_log.WithContext(r.Context(), f.log)).Log(logMessage...)
 	}
+
+	var reason string
+	// 413, 422 or 429.


nit: I don't think we need this comment :-) It's just redundant with the if statements.

alvinlin123 · 2023-05-24T00:15:08Z

pkg/frontend/transport/handler.go

+)
+
+var (
+	LimitTooManySamples    = `query processing would load too many samples into memory`


Does this have to be exported? I only see this being used in the transport package.

alvinlin123 · 2023-05-24T00:23:31Z

CHANGELOG.md

@@ -6,6 +6,7 @@
 * [CHANGE] Ingester: Creating label `native-histogram-sample` on the `cortex_discarded_samples_total` to keep track of discarded native histogram samples. #5289
 * [FEATURE] Store Gateway: Add `max_downloaded_bytes_per_request` to limit max bytes to download per store gateway request.
 * [FEATURE] Added 2 flags `-alertmanager.alertmanager-client.grpc-max-send-msg-size` and ` -alertmanager.alertmanager-client.grpc-max-recv-msg-size` to configure alert manager grpc client message size limits. #5338
+* [FEATURE] Query Frontend: Add `cortex_discarded_queries_total` metric for throttled queries. #5356


I think cortex_failed_queries_total is a better name than cortex_discarded_queries_total based what we are doing in this PR. It doesn't make sense that a query can be "discarded".

using "failed" has the benefit that we can use the same metric name for 5xx later on -- just have different value for reason.

I would recommend to change this entry to
[FEATURE] Query Frontend: Add cortex_failed_queries_totalmetric for failed queries withreason label. #5356

I agree cortex_discarded_queries_total is not a good name here. What I want to track is probably cortex_throttled_queries_total.
cortex_failed_queries_total is too general and I don't really want to track 5xx with this metric, 5xx can be of difference reasons and I feel it is more useful to track throttled queries for now.

I see, yea cortex_throttled_queries_total make more sense. But "throttled" in this context is still weird because it's supposed to mean "flow control", but what we are doing here is not flow control when chunk is too big.

How about cortex_rejected_queries_total?

Love the name of cortex_rejected_queries_total!

alvinlin123 · 2023-05-24T00:31:36Z

pkg/frontend/transport/handler.go

+	reasonTooManyRequests         = "too_many_requests"
+	reasonTooLongRange            = "too_long_range"
+	reasonTooManySamples          = "too_many_samples"
+	reasonSeriesFetched           = "series_fetched"


Nit: I think it would be more consistent if we use too_many_series_fetched. I am thinking about cortex_discarded_queries_total{reason="series_fetched"} v.s. cortex_discarded_queries_total{reason="too_many_series_fetched"}; I think later one is more clear to reader.

However, the counter argument is that the label name become bigger and may have performance impact. In which case maybe we should change too_many_samples to samples?

too_many_ is kind of redundant I feel as we usually don't limit too_few. I will try to rename a few but I have to keep too_many_requests and too_many_samples. It is not easy to find a better name

pkg/frontend/transport/handler.go

Signed-off-by: Ben Ye <benye@amazon.com>

alvinlin123

🍺 🍻

alvinlin123

🍺 🍻

pull-request-size bot added the size/L label May 23, 2023

alvinlin123 reviewed May 24, 2023

View reviewed changes

yeya24 added 4 commits May 24, 2023 11:13

metrics for query throttling

29e8c5d

Signed-off-by: Ben Ye <benye@amazon.com>

update changelog

b27d429

Signed-off-by: Ben Ye <benye@amazon.com>

lint

4436943

Signed-off-by: Ben Ye <benye@amazon.com>

address comment

bac7b1f

Signed-off-by: Ben Ye <benye@amazon.com>

yeya24 force-pushed the query-throttle-limit branch from 131cee9 to bac7b1f Compare May 24, 2023 19:09

yeya24 added 2 commits May 24, 2023 19:22

update proto

cdb5566

Signed-off-by: Ben Ye <benye@amazon.com>

set limit hit in query frontend

90d828f

Signed-off-by: Ben Ye <benye@amazon.com>

alanprot approved these changes May 24, 2023

View reviewed changes

yeya24 changed the title ~~Add metrics for discarded queries in QFE~~ Add metrics for rejected queries in QFE May 24, 2023

alvinlin123 approved these changes May 24, 2023

View reviewed changes

alvinlin123 merged commit 8175ef6 into cortexproject:master May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics for rejected queries in QFE #5356

Add metrics for rejected queries in QFE #5356

yeya24 commented May 23, 2023 •

edited

Loading

alvinlin123 May 24, 2023

alvinlin123 May 24, 2023

alvinlin123 May 24, 2023

yeya24 May 24, 2023

alvinlin123 May 24, 2023

yeya24 May 24, 2023

alvinlin123 May 24, 2023

yeya24 May 24, 2023

alvinlin123 left a comment

alvinlin123 left a comment

Add metrics for rejected queries in QFE #5356

Add metrics for rejected queries in QFE #5356

Conversation

yeya24 commented May 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alvinlin123 left a comment

Choose a reason for hiding this comment

alvinlin123 left a comment

Choose a reason for hiding this comment

yeya24 commented May 23, 2023 •

edited

Loading