Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add max-inflight-requests limit to store gateway #5553

Merged
merged 9 commits into from
Sep 13, 2023

Conversation

justinjung04
Copy link
Contributor

@justinjung04 justinjung04 commented Sep 8, 2023

What this PR does:

Store gateway can only handle set number of requests at a given time, and rest of the requests are just queued up, waiting in the queryGate (channel). Since queriers select store gateways randomly without any information about their load, lots of requests could flood to one store gateway instance and slows down overall query latency (or could even lead to timeouts).

This PR add a new configuration -blocks-storage.bucket-store.max-inflight-request that allows requests to be rejected fast, which then queriers could try other store gateway replicas without waiting.

Which issue(s) this PR fixes:
n/a

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Justin Jung <jungjust@amazon.com>
@justinjung04 justinjung04 force-pushed the sg-max-inflight-request branch from b7f7b10 to e88b17e Compare September 8, 2023 23:16
@justinjung04 justinjung04 changed the title Add max-inflight-request limit to bucket stores Add max-inflight-requests limit to bucket stores Sep 8, 2023
Signed-off-by: Justin Jung <jungjust@amazon.com>
@justinjung04 justinjung04 changed the title Add max-inflight-requests limit to bucket stores Add max-inflight-requests limit to store gateway Sep 8, 2023
Signed-off-by: Justin Jung <jungjust@amazon.com>
@justinjung04 justinjung04 force-pushed the sg-max-inflight-request branch from 46978a9 to 5b4cfdc Compare September 11, 2023 17:58
Signed-off-by: Justin Jung <jungjust@amazon.com>
@justinjung04 justinjung04 force-pushed the sg-max-inflight-request branch 2 times, most recently from 086dfb6 to 851d1e9 Compare September 11, 2023 22:05
Signed-off-by: Justin Jung <jungjust@amazon.com>
pkg/storegateway/bucket_stores.go Outdated Show resolved Hide resolved
pkg/storegateway/bucket_stores.go Outdated Show resolved Hide resolved
pkg/querier/blocks_store_queryable.go Show resolved Hide resolved
@justinjung04 justinjung04 marked this pull request as ready for review September 11, 2023 22:48
Signed-off-by: Justin Jung <jungjust@amazon.com>
@justinjung04 justinjung04 force-pushed the sg-max-inflight-request branch from 851d1e9 to c6b1264 Compare September 11, 2023 23:09
pkg/storegateway/bucket_stores.go Outdated Show resolved Hide resolved
pkg/querier/blocks_store_queryable.go Show resolved Hide resolved
Copy link
Member

@friedrichg friedrichg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what you are doing.
Isn't this more like an instance limit?
It would be nice to have metrics like it was done in #4071

pkg/storage/tsdb/config.go Outdated Show resolved Hide resolved
docs/blocks-storage/querier.md Show resolved Hide resolved
pkg/querier/blocks_store_queryable.go Show resolved Hide resolved
@yeya24
Copy link
Contributor

yeya24 commented Sep 12, 2023

It would be nice to have metrics like it was done in #4071

We already have metrics for current usage in https://github.com/cortexproject/cortex/blob/master/vendor/github.com/weaveworks/common/server/server.go#L316

@friedrichg
Copy link
Member

right that is one, the other one was instanceLimitsMetric which shows the current limit. If it is indeed an instance limit.

I mean, I am just helping 😄 , not blocking. The requests for changes is for the tiny nit on the flag.

@yeya24
Copy link
Contributor

yeya24 commented Sep 12, 2023

@friedrichg Thanks for the review and help. I was just saying we already have the usage metric, agree we can add the limit metrics if necessary. All good, not blocking at all.

…ight before making the series call

Signed-off-by: Justin Jung <jungjust@amazon.com>
@justinjung04
Copy link
Contributor Author

I liked the idea of grouping all instance limits, so on top of this new limit also created metrics for existing limits:

  • Max concurrent
  • Max chunk pool bytes

Signed-off-by: Justin Jung <jungjust@amazon.com>
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yeya24
Copy link
Contributor

yeya24 commented Sep 12, 2023

@justinjung04 test failure seems related to the new metrics added. Once we fix that we can merge

Signed-off-by: Justin Jung <jungjust@amazon.com>
@yeya24 yeya24 enabled auto-merge (squash) September 12, 2023 21:25
@yeya24 yeya24 disabled auto-merge September 12, 2023 21:25
@yeya24 yeya24 enabled auto-merge (squash) September 12, 2023 21:26
@yeya24 yeya24 merged commit 5ca94a0 into cortexproject:master Sep 13, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants