Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent executions count with ActiveJob.retry_on #41

Open
eLod opened this issue Aug 17, 2021 · 7 comments
Open

Inconsistent executions count with ActiveJob.retry_on #41

eLod opened this issue Aug 17, 2021 · 7 comments
Labels
bug Something isn't working

Comments

@eLod
Copy link

eLod commented Aug 17, 2021

Rails 5.2.2, gem version 0.11.

I understand retry_on and company not supported yet (though discard_on simply works), so this is not strictly a bug.

executions (with provider_id and priority though these 2 are not problematic) is filtered from serialization (ActiveJob::QueueAdapters#build_worker and ActiveJob::QueueAdapters::SERIALIZATION_FILTERED_KEYS) and supplied by google cloud tasks (also see #6), however because of retrying/rescheduling (new task id, retry count is 0) this keeps resetting, which in turn leads to never ending retrial.

If retry_on functionality is desirable i was thinking maybe putting this information in job_meta (or even in root worker_payload) it could be retained. If you are not opposing this change i would gladly try to make a PR for it.

@eLod
Copy link
Author

eLod commented Aug 17, 2021

checked commenting out the executions from SERIALIZATION_FILTERED_KEYS and the override with job_executions in JobWrapper#perform makes it work as expected (obviously this simply ignores the value sent in headers).

upon further thinking about this i am not sure what should be the expected way of working. retry_on already overlaps with the backend's retrial, e.g. retry_on simply reschedules the task without raising the error, completing the actual running one, and when it runs out of attempts it simply reraises the error (by default, eg without a block), that will kick in the backend's retry mechanism.

sidekiq works this way also, but i haven't checked the execution count specificallly, eg for a failing task that is retry_on Exception, attempts: 2, after the 2 trials when it is retried with sidekiq's mechanism, does it raise the execution count or not.

@alachaum
Copy link
Member

I don't have full background as to why ActiveJob decided to go with the retry_on option. To me it seems (1) redundant with the retry capabilities of most backends and (2) confusing because errors handling logic is therefore split between retry_on and the backend retry logic.

I'm going to close this issue until there is a strong use case for it, in which case we can reopen.

@eLod
Copy link
Author

eLod commented Aug 18, 2021

well for me retry_on is the better/preferred way. first it's consistent, not locking into the backend's mechanism, so i am free to switch between sidekiq and cloudtasker and be confident the api is/behaves the same. that's a big win. second it has way more features than the backend, for example i can specify retry mechanism per exception class (though in rails 5 the exceptions are counted with a single counter, but that is corrected for rails6) without additional glue code (that is again backend specific and has to be refactored all the time switching backends).

i agree it's a bit confusing, but providing an empty block or error tracking simply solves the backend mechanism kicking in. i feel the overlap and confusion comes because the backends all want to provide a way to be usable without rails or activejob so they "duplicate" its functionality (like cloudtasker's on_error and on_dead, which are already baked in activejob).

i understand if you are opposing this change, so i am not pushing it, but i thought i will share my 2cents.

@eLod
Copy link
Author

eLod commented Aug 18, 2021

just checked with sidekiq, and can confirm, it works as described, e.g. with retry_on Exception, wait: 5.seconds, attempts: 3 i see the 3 runs with executions 1, 2 and 3, then (error is reraised,) sidekiq's own retry mechanism kicks in, however executions is never increased anymore, so it stays at 3 for any further performs.

edit: and just to be explicit, without retry_on (eg with only sidekiq's own retry mechanism) executions is simply never incremented and stays 1 no matter how many retries. so executions is mostly part of ActiveJob's own retry mechanism (eg retry_on).

@alachaum alachaum added the bug Something isn't working label Aug 19, 2021
@alachaum alachaum reopened this Aug 19, 2021
@alachaum
Copy link
Member

Fair enough. I've reopened the issue. If you feel like opening a PR for it, feel free to do so. Otherwise I'll check how to address this some time in the future.

@eLod
Copy link
Author

eLod commented Aug 19, 2021

i can cook up a PR for this the next week, do you have any suggestions? should i just ignore the values from headers (when using the JobWrapper only, so only for ActiveJob)? should i add helper methods to activejob (so users can actually get those values if they want to)?

@lagr
Copy link

lagr commented Dec 20, 2023

It has been a while, but I've made a proposal for an intermediate solution based on some of the suggestions in the issue. It comes with a few sharp edges but would already provide an escape hatch to the override and could be further extended as needed. If you can spare some time, it would be great to get some feedback 🙏 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants