-
-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best practices in deploying and monitoring a queue #523
Comments
Hi @stas - thank you for your work on jsonapi-serializer!
I ran into a similar issue in dev a while back. If I remember correctly, I had failed to account for the GoodJob LISTEN/NOTIFY thread when setting the database pool size. I think the latest recommendation (@bensheldon - please correct me if I am wrong) is this:
|
@stas thanks for opening the issue and sorry for the trouble! It shouldn't (typically) be necessary to manage the database connection pool yourself. GoodJob wraps each job with a Rails reloader that will checkout/check-in database connections: good_job/lib/good_job/scheduler.rb Lines 266 to 275 in 2edcd02
Edit: just saw @reczy's comment. That is my first thought of where the culprit lies. |
Thanks @reczy @bensheldon We're running GJ on a dedicated instance, so there's nothing but the workers consuming the pool connections. From what I understand, running the default setup of
I'm not sure I understand the 20% margin though, @reczy do you think you can help clarify where this is coming from? @bensheldon looking at the scheduler implementation, from what I can understand, every new job that has to be processed in the future, will get a thread and wait to be processed, is my assumption correct? Because that would explain how we managed to drain the connections pool, as we have a lot of jobs that get scheduled, and it seems like those are just sitting there and waiting. We had a similar issue in Where Appreciate your help here!!! 🙇 |
Yes, that would use 6 threads total (5 execution + 1 LISTEN/NOTIFY), assuming that you're using the default
Slightly different. The Regarding the 20% margin, I'm less confident that there isn't an extra thread or two that lingers a second longer than expected (though I'm hopeful someone would have reported it by now; and I've been running GoodJob successfully across multiple production applications). And also I've seen multiple times where multithreaded code is introduced into an application without awareness that each thread will consume a database connection. I don't see a benefit to running extra lean on configured database connections. |
Got it! Thanks a lot @bensheldon Would you be ok to merge a PR if I prepare it with a note about how to setup the GoodJob in production? |
@stas definitely. Thank you. |
* Added a chapter on how to prepare for production. See #523 for more feedback. * Fix lint errors and move db connection details into section Co-authored-by: Ben Sheldon <bensheldon@gmail.com>
Hi there,
first of all, thank you for all the work that everyone has dedicated into this project!!! ❤️
The reason I'm reaching out is because we're running a bunch of queues with GoodJob on Heroku, and recently we started seeing jobs piling up and not being processed.
After a bit of digging, we found that GooJob would end up out of available connections in the pool (which is totally fair):
But what's scary is that the running processes would be reported as healthy by Heroku, even though the performance of the queue degraded (eg. 500+ jobs pending for 10-20m, while normally it would take minutes to process such a load)!
Based on this, my immediate takeaways are:
Anybody else had to go through something similar?
Should we explicitly try to release any connection back into the pool? Should we use the built-in health-check instead of the generic pid-based health-check?
The text was updated successfully, but these errors were encountered: