Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard against the retry key being set and no retry_count present #346

Merged
merged 1 commit into from
Nov 24, 2015
Merged

Guard against the retry key being set and no retry_count present #346

merged 1 commit into from
Nov 24, 2015

Conversation

aselder
Copy link
Contributor

@aselder aselder commented Nov 23, 2015

This fixes #345

@jondeandres
Copy link
Contributor

@aselder which version of sidekiq are you using?

when msg_or_context['retry_count'] is nil, which value of msg_or_context["retry"] do you have?

@aselder
Copy link
Contributor Author

aselder commented Nov 23, 2015

@jondeandres We're currently on sidekiq 2.17.3. We're planning on upgrading soon, but this is a blocker on us using rollbar until it's fixed.

Not sure what the value of msg_or_context["retry"] is, but I'll try to dig it up with some debuggers.

@aselder
Copy link
Contributor Author

aselder commented Nov 24, 2015

@jondeandres msg_or_context["retry"] is set to 'true' for us, but no retry_count

@pacoguzman
Copy link
Contributor

Thanks @aselder I think this is more reliable than the existing version so LGTM. We've checked retry because you can configure your worker to not be retried so in that case retry has a false value and we don't want to skip the report in that case. And for retriable workers we'll check the retry_count but as you've found we need to be sure we have a value for that key

Good catch thank you!

jondeandres added a commit that referenced this pull request Nov 24, 2015
Guard against the retry key being set and no retry_count present
@jondeandres jondeandres merged commit 4a6965c into rollbar:master Nov 24, 2015
@jondeandres
Copy link
Contributor

@aselder @pacoguzman we've released 2.6.2 with this fix. Thanks.

ejoubaud added a commit to ejoubaud/rollbar-gem that referenced this pull request Jul 2, 2018
Fixes a bug where the first failure of a Sidekiq would always get reported regardless of the `sidekiq_threshold` setting, because then `job_hash['retry_count']` is not set.

Looking at rollbar#319, the current behaviour was definitely not the expected one and I believe it wasn't the original one, but the bug was introduced by a breaking change in Sidekiq that wasn't accounted for in this gem.

Back when this feature was implemented, in rollbar#319, Sidekiq retry mechanism was implemented as a Sidekiq middleware, so it would run downstream of the Rollbar middleware. When an error would bubble up, it would first encounter the Sidekiq `retry_job` middleware, which would set/increment the job's `retry_count` ([code](https://github.com/mperham/sidekiq/blob/d7d000465cd086160843fe95b8836b22d67578b6/lib/sidekiq/middleware/server/retry_jobs.rb#L107-L113) triggered [here](https://github.com/mperham/sidekiq/blob/d7d000465cd086160843fe95b8836b22d67578b6/lib/sidekiq/middleware/server/retry_jobs.rb#L83)), reraise, and only then would the exception meet the Rollbar middleware, with the current job's `retry_count` well set.

Then 2 years ago, sidekiq/sidekiq#3235 introduced Sidekiq 5.0.0, and with it a breaking change: the `retry_job` would not be a middleware anymore but part of the Sidekiq processor. So it would run upstream of the Sidekiq middleware chain, including the Rollbar middleware, meaning that when an error would bubble up, it would meet the Rollbar middleware before Sidekiq's `retry_job` had set the correct `retry_count`.

You'll note that I'm changing a spec about how the middleware should behave when `retry_count` is not set. That spec was introdued in rollbar#346 as a way to fix a `nil` error. It doesn't look like the intent was to actually improve the behaviour of the `sidekiq_threshold` and it seems the contributor may not have been using the feature, just working around a bug that it was causing. Anyway the behaviour specified doesn't make sense to me, and contradicts the expected behaviour described in the original implementation's PR (rollbar#346). If `retry` is truthy and `retry_count` is falsy, the behaviour should not be to ignore the threshold, but to enforce by considering that `retry_count` is 0. Otherwise we get that bug where the first failure gets reported to Rollbar regardless of the configured threshold, before then the next n-1 next failures get rightfully ignored.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NoMethodError: undefined method `<' for nil:NilClass in Rollbar::Sidekiq
3 participants