Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocess RQ workers #4233

Closed
wants to merge 43 commits into from
Closed

Multiprocess RQ workers #4233

wants to merge 43 commits into from

Conversation

rauchy
Copy link
Contributor

@rauchy rauchy commented Oct 10, 2019

What type of PR is this? (check all applicable)

  • Refactor

Description

[Created this as a separate PR from the RQ one, because that one is already too bloated to spot anything]

This is a naive way to launch multiple worker processes, thus utilizing containers more efficiently. It relies on a healthcheck being performed continuously - when the rq healthcheck command returns 1, it means that at least one of the workers in that host is not healthy* and the container should be replaced.

  • A healthy worker is either currently busy (processing a job). If it is not busy, it is still considered healthy if it sent a heartbeat in the past 60 seconds. If it is neither busy or sent a heartbeat in the past 60 seconds, it is healthy if all of the queues it watches are empty (i.e. it has nothing to do).

I won't be surprised if we choose to replace this approach rather quickly, but this is the simplest way I can think of to get started and collect feedback.

Related Tickets & Documents

Mobile & Desktop Screenshots/Recordings (if there are UI changes)

@rauchy rauchy requested a review from arikfr October 10, 2019 12:53

exec /app/manage.py rq worker $QUEUES
sleep infinity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it will be more robust/cleaner to use something like Honcho. This way one worker exiting/crashing won't kill the others.

And to clarify: I don't mean to use the Honcho CLI, but rather use its Manager and Process classes from our code directly, skipping the need to define create a Procfile. We can also implement our own logic, but I guess Honcho took care of many of the possible issues with managing multiple processes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for Honcho!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Honcho directly in 0d95d24. I like it because it will kill all workers and exit if one of the worker processes goes AWOL, but it feels a tad strange to specify a command to run within the code 🤷‍♂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels a tad strange to specify a command to run within the code

I agree. I sort of expected that we can provide a function to invoke as the subprocess, but it makes sense that their code assumes you provide them an executable path.

I like it because it will kill all workers and exit if one of the worker processes goes AWOL

Better option would be to just restart it, no?

(btw, you need to rebase 😬)

Copy link
Contributor Author

@rauchy rauchy Oct 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better option would be to just restart it, no?

Yeah, it does that now.

(btw, you need to rebase 😬)

That. Was. Fun.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it restarts all of them instead of only the faulty one. Between the previous version (process exits when one or more workers misbehave) and current one (all workers restart in a loop when one of the workers misbehave), I think the safer one is the first.

While the end result is somewhat the same, it leaves more room for seeing what's going on. For example with ECS, we will see tasks restarting frequently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to restart only the faulty workers, we probably need to move towards something like supervisor (which was what we initially had in mind, but the purpose of this PR was to provide something lighter 😉)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, reverted the auto-restart for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4371 suggests an alternative implementation using supervisor.

@rauchy rauchy changed the base branch from rq to master October 27, 2019 12:06
@rauchy rauchy force-pushed the multi-process-rq-workers branch from 0d95d24 to 32c989e Compare October 27, 2019 20:38
@rauchy rauchy requested a review from arikfr October 27, 2019 20:39
@rauchy rauchy force-pushed the multi-process-rq-workers branch from 72612ee to 99d8cb3 Compare October 28, 2019 07:19
# Configure any SQLAlchemy mappers loaded until now so that the mapping configuration
# will already be available to the forked work horses and they won't need
# to spend valuable time re-doing that on every fork.
configure_mappers()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably move this to its own PR, so it's not forgotten in case we abandon this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙊#4314

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops :-)

@rauchy
Copy link
Contributor Author

rauchy commented Nov 28, 2019

Running some load tests, it feels like #4371 is the direction we want to go with for multi-process workers, so I'm closing this PR for now.

@rauchy rauchy closed this Nov 28, 2019
@guidopetri guidopetri deleted the multi-process-rq-workers branch July 22, 2023 03:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants