Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enqueue 429 requests if there are multiple query-schedulers #5496

Merged
merged 2 commits into from
Aug 9, 2023

Conversation

damnever
Copy link
Contributor

@damnever damnever commented Aug 3, 2023

What this PR does:

The current implementation cannot ensure that one tenant's requests are evenly distributed among all schedulers. As a result, some scheduler queues may be full, while others still have some room left.

This PR attempts to retry 429 requests randomly if there are multiple query-schedulers. We may need a more sophisticated approach to effectively address this issue.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@@ -192,6 +194,8 @@ func (f *Frontend) RoundTripGRPC(ctx context.Context, req *httpgrpc.HTTPRequest)
// even if this goroutine goes away due to client context cancellation.
enqueue: make(chan enqueueResult, 1),
response: make(chan *frontendv2pb.QueryResultRequest, 1),

retryOnTooManyOutstandingRequests: f.schedulerWorkers.getWorkersCount() > 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple query schedulers have queues full, then retry may make queueing even worse? A request will retry forever until it is not discarded by the queue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the retry can still be useful. Let's add a feature flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retries := f.cfg.WorkerConcurrency + 1 limits the maximum number of retries.

A feature flag sounds good.

@pull-request-size pull-request-size bot added size/M and removed size/S labels Aug 3, 2023
@yeya24
Copy link
Contributor

yeya24 commented Aug 3, 2023

@damnever Can you fix DCO?

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yeya24 yeya24 merged commit f791f5b into cortexproject:master Aug 9, 2023
wenxu1024 pushed a commit to wenxu1024/cortex that referenced this pull request Aug 9, 2023
…xproject#5496)

* Re-enqueue 429 requests if there are multiple query-schedulers

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

* Add feature flag

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>

---------

Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com>
@damnever damnever deleted the fix/retry-429 branch August 10, 2023 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants