-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiprocess RQ workers #4233
Multiprocess RQ workers #4233
Conversation
bin/docker-entrypoint
Outdated
|
||
exec /app/manage.py rq worker $QUEUES | ||
sleep infinity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it will be more robust/cleaner to use something like Honcho. This way one worker exiting/crashing won't kill the others.
And to clarify: I don't mean to use the Honcho CLI, but rather use its Manager
and Process
classes from our code directly, skipping the need to define create a Procfile
. We can also implement our own logic, but I guess Honcho took care of many of the possible issues with managing multiple processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for Honcho!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Honcho directly in 0d95d24. I like it because it will kill all workers and exit if one of the worker processes goes AWOL, but it feels a tad strange to specify a command to run within the code 🤷♂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it feels a tad strange to specify a command to run within the code
I agree. I sort of expected that we can provide a function to invoke as the subprocess, but it makes sense that their code assumes you provide them an executable path.
I like it because it will kill all workers and exit if one of the worker processes goes AWOL
Better option would be to just restart it, no?
(btw, you need to rebase 😬)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better option would be to just restart it, no?
Yeah, it does that now.
(btw, you need to rebase 😬)
That. Was. Fun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it restarts all of them instead of only the faulty one. Between the previous version (process exits when one or more workers misbehave) and current one (all workers restart in a loop when one of the workers misbehave), I think the safer one is the first.
While the end result is somewhat the same, it leaves more room for seeing what's going on. For example with ECS, we will see tasks restarting frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to restart only the faulty workers, we probably need to move towards something like supervisor (which was what we initially had in mind, but the purpose of this PR was to provide something lighter 😉)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, reverted the auto-restart for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#4371 suggests an alternative implementation using supervisor.
0d95d24
to
32c989e
Compare
72612ee
to
99d8cb3
Compare
…heduler entrypoint
…ia get_task_logger for now)
5b5341c
to
d5c5411
Compare
# Configure any SQLAlchemy mappers loaded until now so that the mapping configuration | ||
# will already be available to the forked work horses and they won't need | ||
# to spend valuable time re-doing that on every fork. | ||
configure_mappers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably move this to its own PR, so it's not forgotten in case we abandon this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops :-)
Running some load tests, it feels like #4371 is the direction we want to go with for multi-process workers, so I'm closing this PR for now. |
What type of PR is this? (check all applicable)
Description
[Created this as a separate PR from the RQ one, because that one is already too bloated to spot anything]
This is a naive way to launch multiple worker processes, thus utilizing containers more efficiently. It relies on a healthcheck being performed continuously - when the
rq healthcheck
command returns 1, it means that at least one of the workers in that host is not healthy* and the container should be replaced.I won't be surprised if we choose to replace this approach rather quickly, but this is the simplest way I can think of to get started and collect feedback.
Related Tickets & Documents
Mobile & Desktop Screenshots/Recordings (if there are UI changes)