Multiprocess RQ workers #4233

rauchy · 2019-10-10T12:53:57Z

What type of PR is this? (check all applicable)

Refactor

Description

[Created this as a separate PR from the RQ one, because that one is already too bloated to spot anything]

This is a naive way to launch multiple worker processes, thus utilizing containers more efficiently. It relies on a healthcheck being performed continuously - when the rq healthcheck command returns 1, it means that at least one of the workers in that host is not healthy* and the container should be replaced.

A healthy worker is either currently busy (processing a job). If it is not busy, it is still considered healthy if it sent a heartbeat in the past 60 seconds. If it is neither busy or sent a heartbeat in the past 60 seconds, it is healthy if all of the queues it watches are empty (i.e. it has nothing to do).

I won't be surprised if we choose to replace this approach rather quickly, but this is the simplest way I can think of to get started and collect feedback.

Related Tickets & Documents

Mobile & Desktop Screenshots/Recordings (if there are UI changes)

arikfr · 2019-10-10T14:19:10Z

bin/docker-entrypoint


-  exec /app/manage.py rq worker $QUEUES
+  sleep infinity


I wonder if it will be more robust/cleaner to use something like Honcho. This way one worker exiting/crashing won't kill the others.

And to clarify: I don't mean to use the Honcho CLI, but rather use its Manager and Process classes from our code directly, skipping the need to define create a Procfile. We can also implement our own logic, but I guess Honcho took care of many of the possible issues with managing multiple processes.

+1 for Honcho!

Using Honcho directly in 0d95d24. I like it because it will kill all workers and exit if one of the worker processes goes AWOL, but it feels a tad strange to specify a command to run within the code 🤷‍♂

it feels a tad strange to specify a command to run within the code

I agree. I sort of expected that we can provide a function to invoke as the subprocess, but it makes sense that their code assumes you provide them an executable path.

I like it because it will kill all workers and exit if one of the worker processes goes AWOL

Better option would be to just restart it, no?

(btw, you need to rebase 😬)

Better option would be to just restart it, no?

Yeah, it does that now.

(btw, you need to rebase 😬)

That. Was. Fun.

Now it restarts all of them instead of only the faulty one. Between the previous version (process exits when one or more workers misbehave) and current one (all workers restart in a loop when one of the workers misbehave), I think the safer one is the first.

While the end result is somewhat the same, it leaves more room for seeing what's going on. For example with ECS, we will see tasks restarting frequently.

If we want to restart only the faulty workers, we probably need to move towards something like supervisor (which was what we initially had in mind, but the purpose of this PR was to provide something lighter 😉)

Anyway, reverted the auto-restart for now.

#4371 suggests an alternative implementation using supervisor.

…heduler entrypoint

…ia get_task_logger for now)

This reverts commit 43abac7.

…bs to be scheduled after app has been loaded

…expiring and having periodic jobs not reschedule

This reverts commit 32c989e.

…rker process

arikfr · 2019-11-19T10:44:46Z

redash/cli/rq.py

+    # Configure any SQLAlchemy mappers loaded until now so that the mapping configuration 
+    # will already be available to the forked work horses and they won't need 
+    # to spend valuable time re-doing that on every fork.
+    configure_mappers()


We should probably move this to its own PR, so it's not forgotten in case we abandon this one.

rauchy · 2019-11-28T08:58:19Z

Running some load tests, it feels like #4371 is the direction we want to go with for multi-process workers, so I'm closing this PR for now.

rauchy requested a review from arikfr October 10, 2019 12:53

arikfr reviewed Oct 10, 2019

View reviewed changes

weekly-digest bot mentioned this pull request Oct 14, 2019

Weekly Digest (7 October, 2019 - 14 October, 2019) #4241

Closed

weekly-digest bot mentioned this pull request Oct 21, 2019

Weekly Digest (14 October, 2019 - 21 October, 2019) #4271

Closed

rauchy changed the base branch from rq to master October 27, 2019 12:06

rauchy force-pushed the multi-process-rq-workers branch from 0d95d24 to 32c989e Compare October 27, 2019 20:38

rauchy requested a review from arikfr October 27, 2019 20:39

rauchy force-pushed the multi-process-rq-workers branch from 72612ee to 99d8cb3 Compare October 28, 2019 07:19

Omer Lachish added 20 commits October 29, 2019 15:34

add rq and an rq_worker service

824977b

add rq_scheduler and an rq_scheduler service

7a59a1a

move beat schedule to periodic_jobs queue

4a7c7b7

move version checks to RQ

965ad8a

move query result cleanup to RQ

13c2891

move custom tasks to RQ

2b070f3

do actual schema refreshes in rq

6479c1f

move send_email to rq

c368833

DRY up enqueues

af94ea9

ditch and use a partially applied decorator

d9f6d31

👋 beat

8257983

rename rq_scheduler to plain scheduler, now that there's no Celery sc…

c906b36

…heduler entrypoint

add logging context to rq jobs (while keeping execute_query context v…

6b72ff6

…ia get_task_logger for now)

move schedule to its own module

6073a41

cancel previously scheduled periodic jobs. not sure this is a good idea.

a627080

rename redash.scheduler to redash.schedule

1581920

allow custom dynamic jobs to be added decleratively

67ff9e1

pleasing the CodeClimate overlords

4116fe5

adjust cypress docker-compose.yml to include rq changes

a5adbf3

DRY up Cypress docker-compose

97a3dac

Omer Lachish added 21 commits October 29, 2019 15:34

show docker-compose logs at Cypress shutdown

f714dd6

Revert "DRY up Cypress docker-compose"

6c40c4e

This reverts commit 43abac7.

minimal version for binding is 3.2

77df227

remove unneccesary code reloads on cypress

0540472

SCHEMAS_REFRESH_QUEUE is no longer a required setting

ec5afb4

split tasks/queries.py to execution.py and maintenance.py

814110f

rename worker to celery_worker and rq_worker to worker

f7fa7dc

delete all existing periodic jobs before scheduling them

47c5287

remove some unrequired requires

c3d43c7

move schedule example to redash.schedule

a16e392

pleasing the CodeClimate overlords

2aae335

revert to calling a function in dynamic settings to allow periodic jo…

0be56a5

…bs to be scheduled after app has been loaded

set the timeout_ttl to double the interval to avoid job results from …

5c499c8

…expiring and having periodic jobs not reschedule

a naive way to launch multiple workers in one container

9f5a68e

updated and less brittle healthcheck

1e44681

describe custom jobs and don't actually schedule them

7479493

launch multiple workers with Honcho

91a2eff

fix my faulty rebase

d0e3721

restart all workers when a worker dies

770eaa2

Revert "restart all workers when a worker dies"

306e258

This reverts commit 32c989e.

optimize work horse initialization by configuration mappers on the wo…

d5c5411

…rker process

rauchy force-pushed the multi-process-rq-workers branch from 5b5341c to d5c5411 Compare October 29, 2019 13:35

weekly-digest bot mentioned this pull request Nov 4, 2019

Weekly Digest (28 October, 2019 - 4 November, 2019) #4334

Closed

rauchy mentioned this pull request Nov 19, 2019

Multiprocess RQ workers (using supervisor) #4371

Merged

1 task

arikfr reviewed Nov 19, 2019

View reviewed changes

weekly-digest bot mentioned this pull request Nov 25, 2019

Weekly Digest (18 November, 2019 - 25 November, 2019) #4398

Closed

rauchy closed this Nov 28, 2019

guidopetri deleted the multi-process-rq-workers branch July 22, 2023 03:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiprocess RQ workers #4233

Multiprocess RQ workers #4233

rauchy commented Oct 10, 2019

arikfr Oct 10, 2019

jezdez Oct 17, 2019

rauchy Oct 27, 2019

arikfr Oct 27, 2019

rauchy Oct 27, 2019 •

edited

Loading

arikfr Oct 27, 2019

rauchy Oct 27, 2019

rauchy Oct 27, 2019

rauchy Nov 19, 2019

arikfr Nov 19, 2019

rauchy Nov 19, 2019

arikfr Nov 19, 2019

rauchy commented Nov 28, 2019

Multiprocess RQ workers #4233

Multiprocess RQ workers #4233

Conversation

rauchy commented Oct 10, 2019

What type of PR is this? (check all applicable)

Description

Related Tickets & Documents

Mobile & Desktop Screenshots/Recordings (if there are UI changes)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rauchy Oct 27, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rauchy commented Nov 28, 2019

rauchy Oct 27, 2019 •

edited

Loading