-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prevent "Job X has multiple successful attempts" error #4378
Comments
Summary@jrhizor and I dug into this today we think the root cause is a race condition between the We believe both the "Job X has multiple successful attempts" and the "Job X is in status pending but has an attempt in status running" are both caused by the same root cause. The BugHere's the mechanics behind the problem:
The problem is that if that worker thread pool is highly subscribed. It means that the submitted Possible Solutions
Future Work
@davinchia fyi |
We are seeing this manifest here too. |
I think this can also be solved by increasing the number of MAX_WORKERS. Today this is 4, and lines up with the behaviour we are seeing with logs, where repeated submission happens on the 5th job. Extract from the above linked logs:
Here it's the 4th, but I think it's because there is a running job taking up the slot. I think we should also make max-workers an injectable env var. |
There have now been several occurrences of this bug:
While there is a workaround for manually deleting the problematic record, this issue blocks all future syncs until it is manually resolved.
We need to figure out the root cause here.
The text was updated successfully, but these errors were encountered: