Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.2.0: Handle update-reboot failures/ "crash loops" #123

Closed
cbgbt opened this issue Nov 23, 2021 · 1 comment
Closed

0.2.0: Handle update-reboot failures/ "crash loops" #123

cbgbt opened this issue Nov 23, 2021 · 1 comment
Assignees
Milestone

Comments

@cbgbt
Copy link
Contributor

cbgbt commented Nov 23, 2021

If an update-reboot fails and is forced to rollback to the previous partition, brupop will continue to attempt to add the node to the active set, updating it endlessly and needlessly.

We should record attempted updates (or more generically, attempted state transitions) and give up after some number of attempts, recording a failure metric.

@cbgbt
Copy link
Contributor Author

cbgbt commented Apr 5, 2022

This is merged, but we want to use the migrations work in #161 to ensure a migration exists from existing 0.2.0 CRD. We'll release this in another pre-GA release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants