Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forever retry when job timeout #235

Closed
ngyuki opened this issue Mar 23, 2019 · 12 comments · Fixed by #284
Closed

Forever retry when job timeout #235

ngyuki opened this issue Mar 23, 2019 · 12 comments · Fixed by #284
Labels

Comments

@ngyuki
Copy link

ngyuki commented Mar 23, 2019

hi maintainer.

I have a following job class.

<?php

namespace App\Jobs;

use Illuminate\Bus\Queueable;
use Illuminate\Queue\SerializesModels;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;

class SampleJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public function handle()
    {
        sleep(10);
    }
}

When job timeout, it ignores --tries option and retries forever.

$ while :; do php artisan queue:work --delay=1 --sleep=1 --tries=3 --timeout=3 rabbitmq; done
[2019-03-23 00:14:19][5c957a5b1b0b04.32221024] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:14:24][5c957a5b1b0b04.32221024] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:14:29][5c957a5b1b0b04.32221024] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:14:35][5c957a5b1b0b04.32221024] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:14:40][5c957a5b1b0b04.32221024] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:14:45][5c957a5b1b0b04.32221024] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:14:50][5c957a5b1b0b04.32221024] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:14:55][5c957a5b1b0b04.32221024] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:15:01][5c957a5b1b0b04.32221024] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:15:06][5c957a5b1b0b04.32221024] Processing: App\Jobs\SampleJob
Killed
:
:
:

In case of redis driver, job will fails after the number of --tries option.

$ while :; do php artisan queue:work --delay=1 --sleep=1 --tries=3 --timeout=3 redis; done
[2019-03-23 00:17:13][dRI4jvP7vNZlMw2rnJbZgiPWf1XlyH5C] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:18:43][dRI4jvP7vNZlMw2rnJbZgiPWf1XlyH5C] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:20:14][dRI4jvP7vNZlMw2rnJbZgiPWf1XlyH5C] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:21:43][dRI4jvP7vNZlMw2rnJbZgiPWf1XlyH5C] Processing: App\Jobs\SampleJob
[2019-03-23 00:21:43][dRI4jvP7vNZlMw2rnJbZgiPWf1XlyH5C] Failed:     App\Jobs\SampleJob
@ngyuki
Copy link
Author

ngyuki commented Mar 23, 2019

Workaround

Set queue x-message-ttl with RABBITMQ_QUEUE_ARGUMENTS environment variable.

# .env
RABBITMQ_QUEUE_ARGUMENTS={"x-message-ttl":60000}

Message is deleted from queue in specified milliseconds, so it will not be forever retried.

$ while :; do php artisan queue:work --delay=1 --sleep=1 --tries=3 --timeout=3 rabbitmq; done

[2019-03-23 00:49:54][5c9582b1b9d359.01672694] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:49:59][5c9582b1b9d359.01672694] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:50:04][5c9582b1b9d359.01672694] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:50:09][5c9582b1b9d359.01672694] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:50:14][5c9582b1b9d359.01672694] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:50:19][5c9582b1b9d359.01672694] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:50:25][5c9582b1b9d359.01672694] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:50:30][5c9582b1b9d359.01672694] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:50:35][5c9582b1b9d359.01672694] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:50:40][5c9582b1b9d359.01672694] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:50:46][5c9582b1b9d359.01672694] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 00:50:51][5c9582b1b9d359.01672694] Processing: App\Jobs\SampleJob
Killed

However, this method can not detect job failed.

@ngyuki
Copy link
Author

ngyuki commented Mar 23, 2019

Workaround 2

Set queue x-message-ttl and x-dead-letter-exchange and x-dead-letter-routing-key with RABBITMQ_QUEUE_ARGUMENTS environment variable.

# .env
RABBITMQ_QUEUE_ARGUMENTS={"x-message-ttl":60000,"x-dead-letter-exchange":"","x-dead-letter-routing-key":"failed"}

Fail jobs in failed queue on JobProcessing event.

# AppServiceProvider.php
<?php

namespace App\Providers;

use Illuminate\Queue\Events\JobProcessing;
use Illuminate\Queue\MaxAttemptsExceededException;
use Illuminate\Queue\QueueManager;
use Illuminate\Support\ServiceProvider;

class AppServiceProvider extends ServiceProvider
{
    public function boot(QueueManager $qm)
    {
        $qm->before(function (JobProcessing $ev) {
            if ($ev->job->getQueue() === 'failed') {
                $ev->job->fail($e = new MaxAttemptsExceededException(
                    $ev->job->resolveName() . ' has entered dead letter.'
                ));
                throw $e;
            }
        });
    }
}

Job sent to dead letter will be failed.

$ while :; do php artisan queue:work --delay=1 --sleep=1 --tries=3 --timeout=3 --queue default,failed rabbitmq; done
[2019-03-23 02:02:18][5c9593a9e97648.96240097] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 02:02:24][5c9593a9e97648.96240097] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 02:02:29][5c9593a9e97648.96240097] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 02:02:34][5c9593a9e97648.96240097] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 02:02:39][5c9593a9e97648.96240097] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 02:02:44][5c9593a9e97648.96240097] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 02:02:50][5c9593a9e97648.96240097] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 02:02:55][5c9593a9e97648.96240097] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 02:03:00][5c9593a9e97648.96240097] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 02:03:05][5c9593a9e97648.96240097] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 02:03:11][5c9593a9e97648.96240097] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 02:03:16][5c9593a9e97648.96240097] Processing: App\Jobs\SampleJob
Killed
[2019-03-23 02:03:21][5c9593a9e97648.96240097] Failed:     App\Jobs\SampleJob

@bromberglab-services
Copy link

+1 - same issue here

@vyuldashev vyuldashev added the bug label Apr 16, 2019
@vyuldashev
Copy link
Owner

I will look into it.

@deadpony
Copy link

still an issue

@annafleming
Copy link

Experiencing the same issue

@mrvtoney
Copy link

I have a fix for this issue. What are the requirements to create a PR?

@vyuldashev
Copy link
Owner

@mrvtoney No formal rules for now. Just submit a PR with a short explanation and I will make a review.

@vyuldashev
Copy link
Owner

I have a solution but it requires a rework of an interaction flow with AMQP.
Currently, I am hardly working on v10 which will correct its interaction with AMQP and have some good stuff.

@vyuldashev vyuldashev mentioned this issue Nov 25, 2019
Merged
@vyuldashev
Copy link
Owner

v10 is released. Go and try it!

vyuldashev added a commit that referenced this issue Dec 12, 2019
@bogutskyy
Copy link

It's still actual for me. :(

@adm-bome
Copy link
Collaborator

adm-bome commented May 3, 2020

That's correct, and by design.

Normally a Job will process... its outcome will be successful or it will fail with an error in the code.
When it fails to process and throws an error, the message is retried an x times before definitely marked as failed.

When a worker timed-out, its almost impossible to know why this event occured. workers are managed and work on system signals (alarms).

It's save to say the Job was not processed / tried at all.

To not lose the message when the worker is killed it is just rejected and requeued. In rabbitMQ this is called redelivery and a message is flagged with an boolean redelivered tag.

It is impossible to count the times the redelivery has taken place.

If you run into a timeout issue with your workers because Jobs take too long to process:

  • set a wider timeout.
  • rework your jobs to process faster
  • rework your job to smaller jobs and chain them.

The timeout parameter only exists for broken/dead workers, so they get removed and another process/worker may get started.

Also note: when setting a timeout to low (i think below 10 sec) the workers are killed. Has something to do with the system internals.

@vyuldashev What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants