Does the job.process function prevent a job from stalling #299

timcosta · 2016-05-16T20:27:53Z

My question is basically what's in the title. I'm trying to use Bull for a queue to pipe video streams from one source to another, think moving videos from one storage area to another. When these videos grow large, Bull occasionally reports the job as stalled and restarts it, even though the upload is still occurring and I am calling job.process every 10 seconds or so. Is there a way to have job.process act as the "check in" for the job so that it doesnt report as stalled?

manast · 2016-05-16T20:56:03Z

You will need to implement your operation as non blocking IO, by using fibers or some other asynchronous mechanism (streams maybe), depends on your use case.

timcosta · 2016-05-16T22:02:52Z

                http.get(url, function onResponse(res){
                    managedUpload = s3.upload({
                        Body: res,
                        ACL: 'public-read',
                        Bucket: 'test-bucket',
                        Key: domain+'/prod/video/'+entryId+'.'+fileExt
                    })
                    managedUpload.on('httpUploadProgress', function(progress){
                        job.progress(progress.part);
                    });
                    managedUpload.send(function(err, data){
                        if(err){
                            return done(new Error(err));
                        }
                        console.log("Done.");
                        return done();
                    });
                });

So this is what we're doing. http is a reference to the node http library, and s3 is an instance of the official aws-sdk. From what I understand, this is passing streams back and forth, which is non-blocking IO. I see progress events registering, and then the job stalls and gets retried while the previous upload is still going. Here's what my logs look like:

Attempting to pipe videoX from Y to S3
STALLED
Attempting to pipe videoX from Y to S3
Done.
Done.

The job completes twice, and the Matador web UI shows the progress fluctuation based on which upload reported progress most recently.

manast · 2016-05-17T07:12:59Z

Thats very weird, I will look more deeply into it.

xdc0 · 2016-06-15T19:19:38Z

@tjsail33 Bull will emit the stalled event when it detects a job is stalled. Do you see this event being triggered? You can register a listener by doing this:

queue.on('stalled', function (job) {
  console.log('Job %s is stalled', job.jobId);
});

xdc0 · 2016-06-15T20:40:41Z

Note: currently the loop that checks for stalled jobs could actually pick a job for a first process, so this may return false positives.

I think a better one is to listen for:

queue.on('active', function (job) {
});

That would signal a job that just started processing, if you see multiple stalled events firing for the same job id, together with multiple active ones, then Bull is retrying a job that shouldn't be retrying.

manast · 2016-06-16T10:26:51Z

I will close this issue for now since we are lacking response from the submitter.

timcosta · 2016-06-16T11:29:06Z

Hey @manast, sorry for the lack of instant response, but I was essentially sleeping for 80% of the time that passed between comments.

The answer is that yes, there were multiple active events for the same job. Here's what the output looked like:

1: active
1: Uploading to S3
1: Part 1
1: Part 2
1: Part 3
1: stalled
1: Part 4
1: active
1: Uploading to S3
1: Part 5
1: Part 1
1: Part 6
1: Part 2

It was processing the same job twice simultaneously.

manast · 2016-06-16T11:33:07Z

what I see strange here is that the job got stalled to begin with. It should not happen if the event loop has not been blocked. Any chance you could post the whole process function? also, do you have any other code running on the same node process? Did you also try with the latest version 1.0.0-rc3 ?

timcosta · 2016-06-16T16:06:05Z

Unfortunately I can't share the rest of the process function due to IP restrictions at work. This was the only job that we had running at the time, and there was nothing else being done by the node process other than process these files. I am not able to try with the latest version, as we were unfortunately forced to rewrite using another library due to time constraints. Sorry this isn't terribly helpful for debugging purposes.

If it makes a difference, the calls to job.log were appearing in the UI under the same job object, even though it was being processed twice at the same time. The order they were appearing in is the same as the order in my prior comment. So there wasnt job duplication or anything like that, it just seemed to start processing the same job twice after deciding the first one had stalled even though job.log and job.progress were being called

xdc0 · 2016-06-16T18:20:21Z

1: active
1: Uploading to S3
1: Part 1
1: Part 2
1: stalled
1: Part 4
1: active
1: Uploading to S3
1: Part 5
1: Part 1
1: Part 6
1: Part 2

@tjsail33 I'm assuming that Part N are reports on progress, how come that from Part 2 it jumped to Part 4? Did it not report Part 3 or am I missing something?

timcosta · 2016-06-16T19:01:32Z

@chuym Sorry, was a typo. Corrected it. All parts were correctly reported.

xdc0 · 2016-06-20T17:15:05Z

@tjsail33 : @manast Opened a second issue to track this: #308 and is fixed on 1.0rc4, could you help us by testing out your upload process against 1.0rc4?
Thanks!

manast added the bug label May 17, 2016

manast added this to the 1.0 milestone May 17, 2016

manast closed this as completed Jun 16, 2016

xdc0 mentioned this issue Jun 17, 2016

Create better logic for lock handling #308

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does the job.process function prevent a job from stalling #299

Does the job.process function prevent a job from stalling #299

timcosta commented May 16, 2016

manast commented May 16, 2016

timcosta commented May 16, 2016

manast commented May 17, 2016

xdc0 commented Jun 15, 2016 •

edited

Loading

xdc0 commented Jun 15, 2016

manast commented Jun 16, 2016

timcosta commented Jun 16, 2016 •

edited

Loading

manast commented Jun 16, 2016

timcosta commented Jun 16, 2016 •

edited

Loading

xdc0 commented Jun 16, 2016

timcosta commented Jun 16, 2016

xdc0 commented Jun 20, 2016

Does the job.process function prevent a job from stalling #299

Does the job.process function prevent a job from stalling #299

Comments

timcosta commented May 16, 2016

manast commented May 16, 2016

timcosta commented May 16, 2016

manast commented May 17, 2016

xdc0 commented Jun 15, 2016 • edited Loading

xdc0 commented Jun 15, 2016

manast commented Jun 16, 2016

timcosta commented Jun 16, 2016 • edited Loading

manast commented Jun 16, 2016

timcosta commented Jun 16, 2016 • edited Loading

xdc0 commented Jun 16, 2016

timcosta commented Jun 16, 2016

xdc0 commented Jun 20, 2016

xdc0 commented Jun 15, 2016 •

edited

Loading

timcosta commented Jun 16, 2016 •

edited

Loading

timcosta commented Jun 16, 2016 •

edited

Loading