You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When Zarf fails to install a component it attempts to purge the helm install as shown below:
:package: KEYCLOAK COMPONENT
✔ Pushed 2 images to the zarf registry
⠹ preparing upgrade for keycloak
WARNING Retrying (1/3) in 5s: unable to complete the helm chart install/upgrade: another operation
(install/upgrade/rollback) is in progress
⠏ preparing upgrade for keycloak (6s)
WARNING Retrying (2/3) in 10s: unable to complete the helm chart install/upgrade: another
operation (install/upgrade/rollback) is in progress
⠙ preparing upgrade for keycloak (16s)
WARNING Retrying (3/3) in 20s: unable to complete the helm chart install/upgrade: another
operation (install/upgrade/rollback) is in progress
⠧ purge requested for keycloak
ERROR: Failed to deploy bundle: unable to deploy component "keycloak": unable to install helm chart(s):
unable to install chart after 3 attempts
On initial install this makes some amount of sense, and there is logic that attempts to prevent removal on a failed upgrade. However, in some edge cases (as shown above), Zarf may mistake an upgrade as an initial install and delete something that should instead be rolled back.
These rough steps caused an existing (system critical) deployment to be removed from the cluster:
Successfully deploy a Zarf package including Keycloak
Attempt an upgrade some time later
Realize the upgrade is doomed
ctrl-c out of the deploy (a timeout would have resulted in a rollback, but would have taken ~45mins)
Make a change and re-redeploy
Laugh/cry a little as Keycloak is removed from the cluster
The deletion is almost certainly explained by the aborted deployment leaving the helm release in an incomplete state, but the existence of the behavior in the first place is very concerning for long-lived production deployments.
Describe the solution you'd like
Given a Zarf package
When when there is an error during deployment
Then Zarf attempts a rollback (if available), and does nothing if not
In this particular case, a rollback would not have been possible either due to the invalid helm state -- but in 9/10 situations, I would rather have a broken deployment than no deployment..
Describe alternatives you've considered
Potentially, Zarf could catch the abort signal from ctrl-c and attempt a rollback immediately on exit rather than leaving the helm release in an incomplete state. (half baked idea.. but maybe there's something there..)
The text was updated successfully, but these errors were encountered:
)
## Description
Do not uninstall helm chart after failed install or upgrade
## Related Issue
Fixes#2455
## Checklist before merging
- [x] Test, docs, adr added or updated as needed
- [x] [Contributor Guide
Steps](https://github.com/defenseunicorns/zarf/blob/main/.github/CONTRIBUTING.md#developer-workflow)
followed
---------
Co-authored-by: Austin Abro <37223396+AustinAbro321@users.noreply.github.com>
phillebaba
pushed a commit
to phillebaba/zarf
that referenced
this issue
May 7, 2024
Is your feature request related to a problem? Please describe.
When Zarf fails to install a component it attempts to purge the helm install as shown below:
On initial install this makes some amount of sense, and there is logic that attempts to prevent removal on a failed upgrade. However, in some edge cases (as shown above), Zarf may mistake an upgrade as an initial install and delete something that should instead be rolled back.
These rough steps caused an existing (system critical) deployment to be removed from the cluster:
The deletion is almost certainly explained by the aborted deployment leaving the helm release in an incomplete state, but the existence of the behavior in the first place is very concerning for long-lived production deployments.
Describe the solution you'd like
In this particular case, a rollback would not have been possible either due to the invalid helm state -- but in 9/10 situations, I would rather have a broken deployment than no deployment..
Describe alternatives you've considered
Potentially, Zarf could catch the abort signal from ctrl-c and attempt a rollback immediately on exit rather than leaving the helm release in an incomplete state. (half baked idea.. but maybe there's something there..)
The text was updated successfully, but these errors were encountered: