Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: recover from panic #12009

Merged
merged 2 commits into from
Feb 7, 2022
Merged

scheduler: recover from panic #12009

merged 2 commits into from
Feb 7, 2022

Commits on Feb 4, 2022

  1. scheduler: recover from panic

    If processing a specific evaluation causes the scheduler (and
    therefore the entire server) to panic, that evaluation will never
    get a chance to be nack'd and cleared from the state store. It will
    get dequeued by another scheduler, causing that server to panic, and
    so forth until all servers are in a panic loop. This prevents the
    operator from intervening to remove the evaluation or update the
    state.
    
    Recover the goroutine from the top-level `Process` methods for each
    scheduler so that this condition can be detected without panicking the
    server process. This will lead to a loop of recovering the scheduler
    goroutine until the eval can be removed or nack'd, but that's much
    better than taking a downtime.
    tgross committed Feb 4, 2022
    Configuration menu
    Copy the full SHA
    674a1b5 View commit details
    Browse the repository at this point in the history

Commits on Feb 7, 2022

  1. Configuration menu
    Copy the full SHA
    f13c7be View commit details
    Browse the repository at this point in the history