-
Notifications
You must be signed in to change notification settings - Fork 907
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The scheduler detects the following three types of job failures:
1. Jobs that fail to launch (due to shortage of background workers) 2. Jobs that throw a runtime error 3. Jobs that crash due to a process crashing In cases 2 and 3, additive backoff was applied in calculating the next start time of a failed job. In case 1, no backoff was added and the scheduler immediately re-attempted to launch the job. This commit converts the calculation of backoff from additive to exponential, doubling the wait time between consecutive failures. The maximum backoff for cases 2,3 is SCHEDULE_INTERVAL. It also adds exponential backoff for case 1, starting with a delay of 2 seconds and doubling that up to a ceiling value of 1 minute. For jobs that launch and complete successfully, there is no backoff and the next_start field is calculated simply by adding the schedule_interval to the finish time, as before. Fixes #4562
- Loading branch information
Showing
5 changed files
with
183 additions
and
58 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.