Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow marking releases stuck in a pending state as failed #116

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions pkg/client/actionclient.go
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ type ActionInterface interface {
Get(name string, opts ...GetOption) (*release.Release, error)
Install(name, namespace string, chrt *chart.Chart, vals map[string]interface{}, opts ...InstallOption) (*release.Release, error)
Upgrade(name, namespace string, chrt *chart.Chart, vals map[string]interface{}, opts ...UpgradeOption) (*release.Release, error)
MarkFailed(release *release.Release, reason string) error
Uninstall(name string, opts ...UninstallOption) (*release.UninstallReleaseResponse, error)
Reconcile(rel *release.Release) error
}
Expand Down Expand Up @@ -180,6 +181,14 @@ func (c *actionClient) Upgrade(name, namespace string, chrt *chart.Chart, vals m
return rel, nil
}

func (c *actionClient) MarkFailed(rel *release.Release, reason string) error {
infoCopy := *rel.Info
releaseCopy := *rel
releaseCopy.Info = &infoCopy
releaseCopy.SetStatus(release.StatusFailed, reason)
return c.conf.Releases.Update(&releaseCopy)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I see how it's convenient to put this functionality into the actionClient, I'm not convinced it makes a ton of sense otherwise. There's really only one way to mark a release as failed (this is it), so I'm wondering if we pull this back out of the action client interface and just put this logic directly into the reconciler.

The only missing piece I see for that is giving the Reconciler an ActionConfigGetter field, which would likely just involve adding the field, adding a WithActionConfigGetter functional option, and then tweaking the addDefaults function slightly to handle the fact that the reconciler may already have an ActionConfigGetter setup via the new functional option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, extracted it and now head to adding a fake implementation for the ActionConfigGetter. After that it should be finished. The implementation is still a bit rough though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay @joelanford.
I've tried to fake the ActionConfig but it was more complex than expected, i.e. also interacting with the storage.Storage interface to call the Update func for the given release.
I stopped from there and wondered if it fits the abstraction if the MarkFailed func is renamed to Update to be more generic, so the ActionClient wraps all interactions with Helm in a single struct.
Also the already existing fake client can be leveraged and easily extended.

If theUpdate func does not match the expectations I would implement a fake ActionConfig with a memory release driver in a separate PR.


func (c *actionClient) Uninstall(name string, opts ...UninstallOption) (*release.UninstallReleaseResponse, error) {
uninstall := action.NewUninstall(c.conf)
for _, o := range opts {
Expand Down
1 change: 1 addition & 0 deletions pkg/reconciler/internal/conditions/conditions.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ const (
ReasonUpgradeError = status.ConditionReason("UpgradeError")
ReasonReconcileError = status.ConditionReason("ReconcileError")
ReasonUninstallError = status.ConditionReason("UninstallError")
ReasonPendingError = status.ConditionReason("PendingError")
)

func Initialized(stat corev1.ConditionStatus, reason status.ConditionReason, message interface{}) status.Condition {
Expand Down
58 changes: 36 additions & 22 deletions pkg/reconciler/internal/fake/actionclient.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,17 +48,19 @@ func (hcg *fakeActionClientGetter) ActionClientFor(_ crclient.Object) (client.Ac
}

type ActionClient struct {
Gets []GetCall
Installs []InstallCall
Upgrades []UpgradeCall
Uninstalls []UninstallCall
Reconciles []ReconcileCall

HandleGet func() (*release.Release, error)
HandleInstall func() (*release.Release, error)
HandleUpgrade func() (*release.Release, error)
HandleUninstall func() (*release.UninstallReleaseResponse, error)
HandleReconcile func() error
Gets []GetCall
Installs []InstallCall
Upgrades []UpgradeCall
MarkFaileds []MarkFailedCall
Uninstalls []UninstallCall
Reconciles []ReconcileCall

HandleGet func() (*release.Release, error)
HandleInstall func() (*release.Release, error)
HandleUpgrade func() (*release.Release, error)
HandleMarkFailed func() error
HandleUninstall func() (*release.UninstallReleaseResponse, error)
HandleReconcile func() error
}

func NewActionClient() ActionClient {
Expand All @@ -72,17 +74,19 @@ func NewActionClient() ActionClient {
return func() error { return err }
}
return ActionClient{
Gets: make([]GetCall, 0),
Installs: make([]InstallCall, 0),
Upgrades: make([]UpgradeCall, 0),
Uninstalls: make([]UninstallCall, 0),
Reconciles: make([]ReconcileCall, 0),

HandleGet: relFunc(errors.New("get not implemented")),
HandleInstall: relFunc(errors.New("install not implemented")),
HandleUpgrade: relFunc(errors.New("upgrade not implemented")),
HandleUninstall: uninstFunc(errors.New("uninstall not implemented")),
HandleReconcile: recFunc(errors.New("reconcile not implemented")),
Gets: make([]GetCall, 0),
Installs: make([]InstallCall, 0),
Upgrades: make([]UpgradeCall, 0),
Uninstalls: make([]UninstallCall, 0),
Reconciles: make([]ReconcileCall, 0),
MarkFaileds: make([]MarkFailedCall, 0),

HandleGet: relFunc(errors.New("get not implemented")),
HandleInstall: relFunc(errors.New("install not implemented")),
HandleUpgrade: relFunc(errors.New("upgrade not implemented")),
HandleUninstall: uninstFunc(errors.New("uninstall not implemented")),
HandleReconcile: recFunc(errors.New("reconcile not implemented")),
HandleMarkFailed: recFunc(errors.New("mark failed not implemented")),
}
}

Expand All @@ -109,6 +113,11 @@ type UpgradeCall struct {
Opts []client.UpgradeOption
}

type MarkFailedCall struct {
Release *release.Release
Reason string
}

type UninstallCall struct {
Name string
Opts []client.UninstallOption
Expand All @@ -133,6 +142,11 @@ func (c *ActionClient) Upgrade(name, namespace string, chrt *chart.Chart, vals m
return c.HandleUpgrade()
}

func (c *ActionClient) MarkFailed(rel *release.Release, reason string) error {
c.MarkFaileds = append(c.MarkFaileds, MarkFailedCall{rel, reason})
return c.HandleMarkFailed()
}

func (c *ActionClient) Uninstall(name string, opts ...client.UninstallOption) (*release.UninstallReleaseResponse, error) {
c.Uninstalls = append(c.Uninstalls, UninstallCall{name, opts})
return c.HandleUninstall()
Expand Down
51 changes: 51 additions & 0 deletions pkg/reconciler/reconciler.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ type Reconciler struct {
skipDependentWatches bool
maxConcurrentReconciles int
reconcilePeriod time.Duration
markFailedAfter time.Duration
maxHistory int

annotSetupOnce sync.Once
Expand Down Expand Up @@ -297,6 +298,18 @@ func WithMaxReleaseHistory(maxHistory int) Option {
}
}

// WithMarkFailedAfter specifies the duration after which the reconciler will mark a release in a pending (locked)
// state as false in order to allow rolling forward.
func WithMarkFailedAfter(duration time.Duration) Option {
return func(r *Reconciler) error {
if duration < 0 {
return errors.New("auto-rollback after duration must not be negative")
}
r.markFailedAfter = duration
return nil
}
}

// WithInstallAnnotations is an Option that configures Install annotations
// to enable custom action.Install fields to be set based on the value of
// annotations found in the custom resource watched by this reconciler.
Expand Down Expand Up @@ -531,6 +544,10 @@ func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (res ctrl.
)
return ctrl.Result{}, err
}
if state == statePending {
return r.handlePending(actionClient, rel, &u, log)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelanford To be honest, I don't get to a good solution here and would like that you take a decision here.

1. handlePending in actionClient
My suggestion would be moving the handlePending to the actionClient.
This makes sense as it is a Helm state which is handled such the actionClient abstracts Helm interactions.
To include handlePending in handleReconcile the statePending must be checked at a different execution time (within the switch state in line 559).

2. Moving handlePending to the switch state
As far as I see the only disadvantage is that pre-hooks are executed before the pending state is resolved. This is not necessarily bad and depends more on our definition of pre-hooks/extensions functions.

}

u.UpdateStatus(updater.EnsureCondition(conditions.Irreconcilable(corev1.ConditionFalse, "", "")))

for _, h := range r.preHooks {
Expand Down Expand Up @@ -597,6 +614,7 @@ const (
stateNeedsInstall helmReleaseState = "needs install"
stateNeedsUpgrade helmReleaseState = "needs upgrade"
stateUnchanged helmReleaseState = "unchanged"
statePending helmReleaseState = "pending"
stateError helmReleaseState = "error"
)

Expand Down Expand Up @@ -645,6 +663,10 @@ func (r *Reconciler) getReleaseState(client helmclient.ActionInterface, obj meta
return nil, stateNeedsInstall, nil
}

if currentRelease.Info != nil && currentRelease.Info.Status.IsPending() {
return currentRelease, statePending, nil
}

var opts []helmclient.UpgradeOption
if r.maxHistory > 0 {
opts = append(opts, func(u *action.Upgrade) error {
Expand Down Expand Up @@ -722,6 +744,35 @@ func (r *Reconciler) doUpgrade(actionClient helmclient.ActionInterface, u *updat
return rel, nil
}

func (r *Reconciler) handlePending(actionClient helmclient.ActionInterface, rel *release.Release, u *updater.Updater, log logr.Logger) (ctrl.Result, error) {
err := r.doHandlePending(actionClient, rel, log)
if err == nil {
err = errors.New("unknown error handling pending release")
}
u.UpdateStatus(
updater.EnsureCondition(conditions.Irreconcilable(corev1.ConditionTrue, conditions.ReasonPendingError, err)))
return ctrl.Result{}, err
}

func (r *Reconciler) doHandlePending(actionClient helmclient.ActionInterface, rel *release.Release, log logr.Logger) error {
if r.markFailedAfter <= 0 {
return errors.New("Release is in a pending (locked) state and cannot be modified. User intervention is required.")
}
if rel.Info == nil || rel.Info.LastDeployed.IsZero() {
return errors.New("Release is in a pending (locked) state and lacks 'last deployed' timestamp. User intervention is required.")
}
if pendingSince := time.Since(rel.Info.LastDeployed.Time); pendingSince < r.markFailedAfter {
return fmt.Errorf("Release is in a pending (locked) state and cannot currently be modified. Release will be marked failed to allow a roll-forward in %v.", r.markFailedAfter-pendingSince)
}

log.Info("Marking release as failed", "releaseName", rel.Name)
err := actionClient.MarkFailed(rel, fmt.Sprintf("operator marked pending (locked) release as failed after state did not change for %v", r.markFailedAfter))
if err != nil {
return fmt.Errorf("Failed to mark pending (locked) release as failed: %w", err)
}
return fmt.Errorf("marked release %s as failed to allow upgrade to succeed in next reconcile attempt", rel.Name)
}

func (r *Reconciler) reportOverrideEvents(obj runtime.Object) {
for k, v := range r.overrideValues {
r.eventRecorder.Eventf(obj, "Warning", "ValueOverridden",
Expand Down
66 changes: 66 additions & 0 deletions pkg/reconciler/reconciler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ import (
helmfake "github.com/operator-framework/helm-operator-plugins/pkg/reconciler/internal/fake"
"github.com/operator-framework/helm-operator-plugins/pkg/sdk/controllerutil"
"github.com/operator-framework/helm-operator-plugins/pkg/values"
helmTime "helm.sh/helm/v3/pkg/time"
)

var _ = Describe("Reconciler", func() {
Expand Down Expand Up @@ -382,6 +383,12 @@ var _ = Describe("Reconciler", func() {
Expect(r.valueTranslator.Translate(context.Background(), &unstructured.Unstructured{})).To(Equal(chartutil.Values{"translated": true}))
})
})
var _ = Describe("WithMarkedFailAfter", func() {
It("should set the reconciler mark failed after duration", func() {
Expect(WithMarkFailedAfter(1 * time.Minute)(r)).To(Succeed())
Expect(r.markFailedAfter).To(Equal(1 * time.Minute))
})
})
})

var _ = Describe("Reconcile", func() {
Expand Down Expand Up @@ -474,6 +481,65 @@ var _ = Describe("Reconciler", func() {
Expect(mgr.GetClient().Create(ctx, obj)).To(Succeed())
})

When("release is in pending state", func() {
fakeClient := helmfake.NewActionClient()
deployTime := helmTime.Now().Add(-5 * time.Minute)
exampleRelease := &release.Release{
Name: "example-release",
Info: &release.Info{
Status: release.StatusPendingUpgrade,
LastDeployed: deployTime,
FirstDeployed: deployTime,
},
}

BeforeEach(func() {
r.actionClientGetter = helmclient.ActionClientGetterFunc(func(object client.Object) (helmclient.ActionInterface, error) {
fakeClient.HandleGet = func() (*release.Release, error) {
return exampleRelease, nil
}
fakeClient.HandleMarkFailed = func() error {
return nil
}
return &fakeClient, nil
})
})
AfterEach(func() {
r.actionClientGetter = nil
})

When("time elapsed since last deployment exceeds markFailedAfter duration", func() {
It("should be marked as failed", func() {
r.markFailedAfter = 1 * time.Minute
res, err := r.Reconcile(ctx, req)
Expect(res).To(Equal(reconcile.Result{}))
Expect(err).ToNot(BeNil())
Expect(err).To(MatchError("marked release example-release as failed to allow upgrade to succeed in next reconcile attempt"))
Expect(len(fakeClient.MarkFaileds)).Should(Equal(1))
})
})

When("markFailedAfter is disabled", func() {
It("should require user intervention", func() {
r.markFailedAfter = 0
res, err := r.Reconcile(ctx, req)
Expect(res).To(Equal(reconcile.Result{}))
Expect(err).ToNot(BeNil())
Expect(err).To(MatchError("Release is in a pending (locked) state and cannot be modified. User intervention is required."))
})
})

When("time since last deployment is higher than markFiledAfter duration", func() {
It("should return duration until the release will be marked as failed", func() {
r.markFailedAfter = 10 * time.Minute
res, err := r.Reconcile(ctx, req)
Expect(res).To(Equal(reconcile.Result{}))
Expect(err).ToNot(BeNil())
Expect(err.Error()).Should(ContainSubstring("Release is in a pending (locked) state and cannot currently be modified. Release will be marked failed to allow a roll-forward in"))
})
})
})

When("requested CR release is not present", func() {
When("action client getter is not working", func() {
It("returns an error getting the action client", func() {
Expand Down