Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding batching to re-queuing for timestamp #767

Merged
merged 13 commits into from
Mar 14, 2023

Conversation

brennen-stripe
Copy link
Contributor

@brennen-stripe brennen-stripe commented Mar 1, 2023

During times of high load on our system, we noticed that our queue was behaving poorly. One of the things we noticed was that our scheduled job queue was lagging behind real time. In other words, the re-scheduling was dealing with timestamps that were in the past, sometimes by as much as 15 minutes.

We added a metric around this to keep track of it, you can see an example chart below.

Screen Shot 2023-03-01 at 12 59 02 PM

After diving into the Resque-Scheduler code, it became apparent that the delay was stemming from inefficient queueing of scheduled jobs. This PR aims to fix that by adding batch scheduling, with a customizable batch size.

After sourcing my fork of this repository with this patch applied in our production environment, we have seen the delay between the scheduler and real time virtually eliminated. You can see for yourself in the chart below.

Screen Shot 2023-03-06 at 10 59 10 AM

@PatrickTulskie
Copy link
Contributor

Hey @brennen-stripe... can we separate out the feature/tests from changes to the test matrix in another PR? Also, can we get a PR description so we understand what this is changing? Thanks!

@brennen-stripe brennen-stripe changed the title Adding batching to popping from timestamp WIP: Adding batching to popping from timestamp Mar 1, 2023
@brennen-stripe brennen-stripe marked this pull request as draft March 1, 2023 17:10
@brennen-stripe
Copy link
Contributor Author

brennen-stripe commented Mar 1, 2023

@PatrickTulskie Apologies, added WIP and moved this to a draft. Will be removing the matrix changes, just wanted to sanity check some version compatibility and fix the unrelated the test failures to see green builds. I see theres another PR duplicating that work, so won't make a new one.

@brennen-stripe brennen-stripe marked this pull request as ready for review March 2, 2023 17:08
@brennen-stripe brennen-stripe changed the title WIP: Adding batching to popping from timestamp Adding batching to re-queuing for timestamp Mar 2, 2023
@brennen-stripe
Copy link
Contributor Author

@PatrickTulskie Alright this should be good to go!

@iloveitaly
Copy link
Contributor

All tests except one is passing. The one that isn't passing does pass on a previous ruby version, and it's ruby 2.3, so we can handle separately.

@alxckn
Copy link

alxckn commented Mar 22, 2024

@brennen-stripe your initial issue with lag at stripe seems to have been quite severe (multiple minutes on average).
I'm wondering how far down this fix managed to get the average/p95 down for you (few seconds?) and if you are using a custom batch size higher than the default of 100?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants