Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for rejected queries in QFE #5356

Merged
merged 6 commits into from
May 24, 2023

Conversation

yeya24
Copy link
Contributor

@yeya24 yeya24 commented May 23, 2023

What this PR does:

Add metric cortex_rejected_queries_total in QFE.

Which issue(s) this PR fixes:
Fixes #5355

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@@ -303,6 +350,38 @@ func (f *Handler) reportQueryStats(r *http.Request, queryString url.Values, quer
} else {
level.Info(util_log.WithContext(r.Context(), f.log)).Log(logMessage...)
}

var reason string
// 413, 422 or 429.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think we need this comment :-) It's just redundant with the if statements.

)

var (
LimitTooManySamples = `query processing would load too many samples into memory`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be exported? I only see this being used in the transport package.

@@ -6,6 +6,7 @@
* [CHANGE] Ingester: Creating label `native-histogram-sample` on the `cortex_discarded_samples_total` to keep track of discarded native histogram samples. #5289
* [FEATURE] Store Gateway: Add `max_downloaded_bytes_per_request` to limit max bytes to download per store gateway request.
* [FEATURE] Added 2 flags `-alertmanager.alertmanager-client.grpc-max-send-msg-size` and ` -alertmanager.alertmanager-client.grpc-max-recv-msg-size` to configure alert manager grpc client message size limits. #5338
* [FEATURE] Query Frontend: Add `cortex_discarded_queries_total` metric for throttled queries. #5356
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cortex_failed_queries_total is a better name than cortex_discarded_queries_total based what we are doing in this PR. It doesn't make sense that a query can be "discarded".

using "failed" has the benefit that we can use the same metric name for 5xx later on -- just have different value for reason.

I would recommend to change this entry to
[FEATURE] Query Frontend: Add cortex_failed_queries_totalmetric for failed queries withreason label. #5356

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree cortex_discarded_queries_total is not a good name here. What I want to track is probably cortex_throttled_queries_total.
cortex_failed_queries_total is too general and I don't really want to track 5xx with this metric, 5xx can be of difference reasons and I feel it is more useful to track throttled queries for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yea cortex_throttled_queries_total make more sense. But "throttled" in this context is still weird because it's supposed to mean "flow control", but what we are doing here is not flow control when chunk is too big.

How about cortex_rejected_queries_total?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the name of cortex_rejected_queries_total!

reasonTooManyRequests = "too_many_requests"
reasonTooLongRange = "too_long_range"
reasonTooManySamples = "too_many_samples"
reasonSeriesFetched = "series_fetched"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think it would be more consistent if we use too_many_series_fetched. I am thinking about cortex_discarded_queries_total{reason="series_fetched"} v.s. cortex_discarded_queries_total{reason="too_many_series_fetched"}; I think later one is more clear to reader.

However, the counter argument is that the label name become bigger and may have performance impact. In which case maybe we should change too_many_samples to samples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too_many_ is kind of redundant I feel as we usually don't limit too_few. I will try to rename a few but I have to keep too_many_requests and too_many_samples. It is not easy to find a better name

pkg/frontend/transport/handler.go Outdated Show resolved Hide resolved
yeya24 added 4 commits May 24, 2023 11:13
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
@yeya24 yeya24 force-pushed the query-throttle-limit branch from 131cee9 to bac7b1f Compare May 24, 2023 19:09
yeya24 added 2 commits May 24, 2023 19:22
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
@yeya24 yeya24 changed the title Add metrics for discarded queries in QFE Add metrics for rejected queries in QFE May 24, 2023
Copy link
Contributor

@alvinlin123 alvinlin123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍺 🍻

Copy link
Contributor

@alvinlin123 alvinlin123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍺 🍻

@alvinlin123 alvinlin123 merged commit 8175ef6 into cortexproject:master May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discarded queries metric in QFE
3 participants