Preserve only failed jobs #136

morgoth · 2020-09-14T13:41:06Z

Looks like currently there is no configuration option to preserve only failed jobs and delete successfully finished ones, as preserve_job_records will preserve all.

I think it would be very helpful to have such functionality out of the box (or by config) to be possible to easily investigate failed job errors.

Is this easy to do on my own, with currently available GoodJob API?

The text was updated successfully, but these errors were encountered:

bensheldon · 2020-09-15T16:44:59Z

@morgoth I think this is a great feature request and could see it implemented in GoodJob.

I looked at how easy it would be to implement outside of GoodJob; GoodJob isn't currently instrumented to make it easy :-( So there are two options:

Monkeypatch GoodJob::Job#perform by wrapping the original method and then deleting the job if the job is finished and there isn't an error.
Write your own rake task that deletes finished/unerrored jobs and delete them regularly. GoodJob::Job.where(errored_at: nil).where.not(finished_at: nil).delete_all

morgoth · 2020-09-16T10:48:22Z

@bensheldon How would you see it implemented?

I was thinking about changing config values for GoodJob.preserve_job_records to be something like :never, :always, :on_error
and then changing the perform method signature to something like def perform(preserve_record: GoodJob.preserve_job_records, reperform_on_standard_error: GoodJob.reperform_jobs_on_standard_error)

Then I guess we would also need to set finished_at on such errored record so it is not retried again.

Do you have some different idea on your mind?

6temes · 2020-09-16T12:08:11Z

Sidekiq has an interesting way to manage the failed jobs:

"Failed jobs" means the number of times a job has failed, even if, maybe, it eventually ran successfully after several retries.

"Dead jobs" is a queue where the jobs that have been retried failed more than N number of times go.

Usually, when you have a problem with a job, you catch it in the retries queue because they usually retry during a couple of times, but when Sidekiq gives up, you can find them in "Dead jobs."

I think that this is a really useful pattern.

So, my point is that, keeping jobs that have ran successfully can be optional, but dead jobs should never be removed.

bensheldon · 2020-09-16T14:40:47Z

@morgoth Yes! That's very close to my initial thoughts. Some suggested adjustments:

The Boolean signature for GoodJob.preserve_job_records= needs to be preserved for compatibility. I think the smallest change would be to allow nil, False, True, and the new one :on_error.
For GoodJob::Job#perform, I've been thinking that we can remove the arguemtns entirely and use GoodJob globals within the #perform itself. I parameterized #perform originally because I wasn't sure how the globals would evolve, but I think it's cleaner to remove them altogether. The method definition would become simply def perform.
I think we could deprecate the predicate GoodJob.preserve_job_records? and instead ask that the value of the accessor GoodJob.preserve_job_records be used directly.

Then I guess we would also need to set finished_at on such errored record so it is not retried again.

Exactly! I spent an uncomfortable amount of time staring at this part of the code recently, but it's where I think the changes would take place

good_job/lib/good_job/job.rb

Lines 230 to 240 in 5b59acd

    
           if rescued_error && reperform_on_standard_error 
        
             save! 
        
           else 
        
             self.finished_at = Time.current 
        
             if destroy_after 
        
               destroy! 
        
             else 
        
               save! 
        
             end 
        
           end

@6temes Thanks for sharing how Sidekiq organizes them. That's really helpful. I like the idea of Failed vs Dead.

I'm trying to think about the analogous data with GoodJob and it's complicated. A wrinkle of GoodJob is that an ActiveJob job, when retried, will generate a new GoodJob Job in the database. In other words, there is not a 1-to-1 correspondence between an ActiveJob Job and a GoodJob Job.

To determine a "Dead" ActiveJob Job would require identifying Errored GoodJob Jobs that don't have a newer matching enqueued job. That's doable, but it might be unsatisfyingly messy. I'm imagining, for example, that a GoodJob::Job, if it is a retried job, could store a reference to the previously errored job that generated it. To give an example of the messiness, currently tracking the error state on retries/discards requires passing global state around:

good_job/lib/good_job/railtie.rb

Lines 9 to 17 in 5b59acd

    
           initializer "good_job.active_job_notifications" do 
        
             ActiveSupport::Notifications.subscribe "enqueue_retry.active_job" do |event| 
        
               GoodJob::CurrentExecution.error_on_retry = event.payload[:error] 
        
             end 
        
             ActiveSupport::Notifications.subscribe "discard.active_job" do |event| 
        
               GoodJob::CurrentExecution.error_on_discard = event.payload[:error] 
        
             end 
        
           end

I have appetite for using global state to enable these features, but, well, it's complicated.

Follow up to bensheldon#136 (comment)

Follow up to bensheldon/good_job#136 (comment)

bensheldon added the enhancement New feature or request label Sep 15, 2020

morgoth mentioned this issue Sep 16, 2020

Extract "execute" method to reduce "perform" method complexity #138

Merged

morgoth added a commit to tiramizoo/good_job that referenced this issue Sep 17, 2020

Remove arguments from perform method

5ca59ca

Follow up to bensheldon#136 (comment)

morgoth mentioned this issue Sep 17, 2020

Remove arguments from perform method #140

Merged

morgoth added a commit to tiramizoo/good_job that referenced this issue Sep 17, 2020

Remove arguments from perform method

72ef163

Follow up to bensheldon#136 (comment)

morgoth mentioned this issue Sep 18, 2020

Add GoodJob.preserve_job_records = :on_unhandled_error option to only preserve jobs that errored #145

Merged

bensheldon closed this as completed in #145 Sep 21, 2020

connorchris831 pushed a commit to connorchris831/good_job that referenced this issue Dec 19, 2022

Remove arguments from perform method

29b0786

Follow up to bensheldon/good_job#136 (comment)

legendarydeveloper919 added a commit to legendarydeveloper919/good_job that referenced this issue Mar 15, 2024

Remove arguments from perform method

5c86981

Follow up to bensheldon/good_job#136 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve only failed jobs #136

Preserve only failed jobs #136

morgoth commented Sep 14, 2020

bensheldon commented Sep 15, 2020

morgoth commented Sep 16, 2020

6temes commented Sep 16, 2020 •

edited

Loading

bensheldon commented Sep 16, 2020 •

edited

Loading

Preserve only failed jobs #136

Preserve only failed jobs #136

Comments

morgoth commented Sep 14, 2020

bensheldon commented Sep 15, 2020

morgoth commented Sep 16, 2020

6temes commented Sep 16, 2020 • edited Loading

bensheldon commented Sep 16, 2020 • edited Loading

6temes commented Sep 16, 2020 •

edited

Loading

bensheldon commented Sep 16, 2020 •

edited

Loading