-
Notifications
You must be signed in to change notification settings - Fork 405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker can get caught in infinite loop when redis connection is closed unexpectedly #389
Comments
Also, I think this is actually the root cause of #359, not pausing! The event loop starvation happens where my timeouts for a test or what have you never fire because of this async-but-infinite loop that has higher priority in the node tick order I think. |
I am working on a fix for this. This issue exists in bull v3 and I expect to have a fix by tomorrow. |
## [1.14.3](v1.14.2...v1.14.3) (2021-02-07) ### Bug Fixes * **worker:** avoid possible infinite loop fixes [#389](#389) ([d05566e](d05566e))
🎉 This issue has been resolved in version 1.14.3 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Thanks @manast ! |
Because of https://github.com/taskforcesh/bullmq/blob/master/src/classes/worker.ts#L192-L195 , I am seeing workers get caught in an infinite loop of trying to get the next job only to error again trying to get the next job.
The loop is this:
this.closing
inside the worker is undefinedrun
loop callsgetNextJob
asynchronously, which callswaitForJob
, which callsBRPOPLPUSH
waitForJob
, and is caught ingetNextJob
, and swallowedgetNextJob
call wins thePromise.race
in workerrun
loopgetNextJob
call returns nothing, so worker doesn't work the jobI am not sure why that error is being swallowed, or why the
Connection is closed.
message is special. If I had to guess, special care has to be taken to handle connection closes that we expect on the blocking calls like that. In this case though, the worker has not been explicitly closed, and it's being asked to use a closed redis connection, which I think should be an error that at least gets emitted and maybe takes down the process if unhandled.Happy to do up a PR if someone can tell me what the semantics should be!
The text was updated successfully, but these errors were encountered: