-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Implement Time to live (TTL) for jobs #479
Comments
Note that we have the timeout option currently, but does address the two issues mentioned above. |
If there is any progress on notifying the worker from the queue for a graceful shutdown, I'd like to hear about that in #484 for graceful shutdown within stalled jobs. |
Hello, |
Not really. In fact this TTL functionality could work really well after #488 is ready, since it will allow us to actually kill the process that has exceeded the TTL really forcing it to stop working. These are pretty high prioritized items so expect them to be released soon. |
Hello, it has been 4+ years since this ticket was opened. Is there any progress on its implementation? #488 appears to be implemented, which was supposedly a prerequisite for this feature? |
It's not the best solution but for now you can check the job status on every loop iteration in the process and abort if it status is set to failed. So for stopping a job, you can just set its status to failed. |
So I just added logic to handle this, all I did was add a timestamp to jobs, then in the workers before they do anything else, they check the timestamp. If its older than 120 seconds (in our case), they cancel it. Sounds like a stupid solution right? But the workers make short work of getting of all the stale jobs in a queue this way so they can begin working on valid jobs, it takes seconds to clear out thousands of stale jobs. |
If this is done externally (e.g. outside of the scope of the worker doing the job), say in a separate script, do you know if the worker will actually be stopped/abort the job, if the job has been set to failed by the external script? If the worker keeps waiting for the job to finish (and you have a stalled job because the worker is stuck), this solution would not work for that case.
Is this to prevent jobs that have failed previously from being picked up if they are older than 120 seconds? @manast is there any progress on an official TTL solution for the new version of BullMQ? Happy to bounce around implementation ideas if there is a design proposal that needs refinement. |
When adding jobs to the queue, it should be possible to define a maximum TTL. If the job takes more time to be processed than the TTL, it would be automatically failed.
A few things to consider:
The text was updated successfully, but these errors were encountered: