Rewrite the handling of process stalled jobs to be atomic so it doesn… #358

bradvogel · 2016-10-15T02:05:03Z

…'t double-process jobs. See #356 for more on how this can happen.

The change here is to use a LUA script to atomically iterate the active queue AND get the lock. This ensures that the job is actually in 'active' when it's being processed, which fixes #356. If a stalled job was found, it will continue to call itself recursively until all stalled jobs are processed. This strategy is also slightly more efficient than before because it iterates the jobs all within a script, reducing the back-and-forth with the Redis server.

Additionally, this addresses long-standing issue where a job can be considered "stalled" even though it was just moved to active and before a lock could be obtained by the worker that moved it (#258), by waiting a grace period (with a default of LOCK_RENEW_TIME) before considering a job as possibly stalled. This gives the (real) worker time to acquire its lock after moving it to active.

…'t double-process jobs. See OptimalBits#356 for more on how this can happen. The change here is to use a LUA script to atomically iterate the active queue AND get the lock. This ensures that the job is actually in 'active' when it's being processed, which fixes OptimalBits#356. If a stalled job was found, it will continue to call itself recursively until all stalled jobs are processed. This strategy is also slightly more efficient than before because it iterates the jobs all within a script, reducing the back-and-forth with the Redis server. Additionally, this addresses long-standing issue where a job can be considered "stalled" even though it was just moved to active and before a lock could be obtained by the worker that moved it (OptimalBits#258), by waiting a grace period (with a default of LOCK_RENEW_TIME) before considering a job as possibly stalled. This gives the (real) worker time to acquire its lock after moving it to active.

bradvogel · 2016-10-15T02:54:45Z

Also, per #311 (comment), should a stalled job be moved back to 'wait' to be picked up again, versus what I'm doing here and processing it immediately? That would speed up processing these jobs immensely.

bradvogel · 2016-10-15T04:06:45Z

Closing this in favor of #359 which puts jobs back into the 'wait' queue. It's much faster.

bradvogel mentioned this pull request Oct 15, 2016

Jobs get double processed (if removed immediately upon completion) #356

Closed

bradvogel added a commit to mixmaxhq/bull that referenced this pull request Oct 15, 2016

Merge PR OptimalBits#358 into our fork.

69845d3

bradvogel added a commit to mixmaxhq/bull that referenced this pull request Oct 15, 2016

Merge in PR OptimalBits#358

fcab0d8

bradvogel closed this Oct 15, 2016

wearhere deleted the better-process-stalled-jobs branch October 16, 2016 07:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite the handling of process stalled jobs to be atomic so it doesn… #358

Rewrite the handling of process stalled jobs to be atomic so it doesn… #358

bradvogel commented Oct 15, 2016

bradvogel commented Oct 15, 2016

bradvogel commented Oct 15, 2016

Rewrite the handling of process stalled jobs to be atomic so it doesn… #358

Rewrite the handling of process stalled jobs to be atomic so it doesn… #358

Conversation

bradvogel commented Oct 15, 2016

bradvogel commented Oct 15, 2016

bradvogel commented Oct 15, 2016