Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] Ability to control/speed up sleep when processing empty queue #506

Open
yousifh opened this issue Jul 1, 2022 · 3 comments
Assignees
Labels
designing enhancement New feature or request

Comments

@yousifh
Copy link
Contributor

yousifh commented Jul 1, 2022

Is your feature request related to a problem? Please describe.
I have been benchmarking this library and measuring the latency of time to execution. The time between when a task is enqueued and when it gets picked up by a worker. I'm using the following parameters

  • 7 workers processes
  • 32 concurrency per worker
  • Enqueuing 40k tasks/min
  • Workers combined have ability to process 45k tasks/min

One thing I noticed is that the average time to execution latency was about 190ms which seemed quite high. Digging into the code the issue seems to be about when the processor sleeps when it hits an empty queue

asynq/processor.go

Lines 176 to 184 in c70ff6a

case errors.Is(err, errors.ErrNoProcessableTask):
p.logger.Debug("All queues are empty")
// Queues are empty, this is a normal behavior.
// Sleep to avoid slamming redis and let scheduler move tasks into queues.
// Note: We are not using blocking pop operation and polling queues instead.
// This adds significant load to redis.
time.Sleep(time.Second)
<-p.sema // release token
return

Since globally all the workers can process tasks at faster rate than they are being enqueued, they will frequently hit this 1 sec sleep period and increase the latency of picking up tasks.

Describe the solution you'd like
I prototyped couple of solutions to see if there are ways to lower the latency.

First solution was to make the sleep period configurable. Then I set it to 100ms. That seemed reasonable enough given that Redis operations by Asynq run very fast and 100ms gives it enough time for other background stuff to run like the scheduler.

Running the benchmark again, the latency now dropped to an average of 30ms. Looking at Redis metrics, on idle with 7 workers before the change the redis.cpu.user metric was around 2ms while after doing the change, it increased to 5ms. It doubled but still seems reasonable.

Of course this is idle conditions, but during normal operations the impact on the Redis server is negligible. So for example during the benchmark in both cases the redis.cpu.user metric was averaging 94ms, but the lower sleep period case had lower latency as an advantage.

Describe alternatives you've considered
I tested an alternative solution to have the sleep period for empty queues dynamically change between 1 sec and 100ms based on backoff of when it sees empty vs non-empty. So starts at 1 sec, decreases to 100ms for every successive call of non-empty queue. And keeps increasing all way back to 1 sec for calls when the queue is empty. But that just seemed to add more complexity to the code.

Additional context
If you think the first approach is reasonable, I can push a PR to allow users to set the sleep period while letting it default to 1 sec. So if an app doesn't need lower latency they can ignore it, while apps that need low latency can set that option at the cost of higher CPU utilization of the Redis server while it's idle.

If there are other solutions to explore, I'm more than happy to prototype and open a PR.

@yousifh yousifh added the enhancement New feature or request label Jul 1, 2022
@hibiken
Copy link
Owner

hibiken commented Jul 2, 2022

@yousifh Thank you for creating this issue! And thanks for providing this insight, very helpful.

In an ideal world, Redis provides a command to do just what we want, but not likely(context: redis/redis#1785).

Make the sleep time configurable sounds ok to me, but maybe we should brainstorm other alternatives first.
Let me get back to you in a week or so. In the meantime, if you have some other ideas, please let us know in this thread!

@yousifh
Copy link
Contributor Author

yousifh commented Jul 11, 2022

I was reading the issue linked and does MBRPOPLPUSH help if the dequeue happens in a Lua script? Wouldn't the Lua script itself still block the entire server while this command is waiting? This comment mentions it is planned but I can't find that mentioned in the docs

The other alternative approach is something like this

The idea being if the processor finds messages it resets the sleep to a baseline (100ms for example) Then if it doesn't find any messages it keeps incrementing it till it hits the max of 1 sec sleep. So while the server is usually busy it will try to fetch messages faster if it happens to get an empty queue error every once in a while. And while the server is idle, it will have the same old behaviour of 1 sec sleep.

The numbers of base sleep and increments can be tweaked. I just went for something simple. During my benchmark tests this made the average time to execution latency be 75ms which is an improvement over the default 190ms

It tries to do an adaptive flow control, but I tried to keep it as basic as possible to not complicate the code

@HenryvanderVegte
Copy link

Hi @hibiken ,

is there any progress on this? I'm also experiencing latency in execution time because of this. Both of the solutions @yousifh described (making the sleep time configurable or having a backoff on sleep) would work for my scenario I think.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
designing enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants