-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow marking releases stuck in a pending state as failed #116
Allow marking releases stuck in a pending state as failed #116
Conversation
Pull Request Test Coverage Report for Build 1593318507
💛 - Coveralls |
9166fc6
to
c26cd1e
Compare
a48929e
to
8805e70
Compare
8805e70
to
2961ecb
Compare
@joelanford @varshaprasad96 @fabianvf On the
Imho if a user wants to do manually upgrade their release the operator should be disabled. WDYTH on this? |
In this step, you're saying an admin runs If so, I think your scenario makes sense, and I agree that the operator should reconcile back (by performing another upgrade) to the desired state as specified by the CR. I was originally thinking of the reverse scenario:
I was originally arguing that "no, it should not", but the more I think about it, I think I'm changing my mind. I could see 2 options for my scenario:
Both of these options align with your scenario because the custom object would exist prior to the |
pkg/client/actionclient.go
Outdated
func (c *actionClient) MarkFailed(rel *release.Release, reason string) error { | ||
infoCopy := *rel.Info | ||
releaseCopy := *rel | ||
releaseCopy.Info = &infoCopy | ||
releaseCopy.SetStatus(release.StatusFailed, reason) | ||
return c.conf.Releases.Update(&releaseCopy) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I see how it's convenient to put this functionality into the actionClient, I'm not convinced it makes a ton of sense otherwise. There's really only one way to mark a release as failed (this is it), so I'm wondering if we pull this back out of the action client interface and just put this logic directly into the reconciler.
The only missing piece I see for that is giving the Reconciler an ActionConfigGetter
field, which would likely just involve adding the field, adding a WithActionConfigGetter
functional option, and then tweaking the addDefaults
function slightly to handle the fact that the reconciler may already have an ActionConfigGetter
setup via the new functional option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, extracted it and now head to adding a fake implementation for the ActionConfigGetter
. After that it should be finished. The implementation is still a bit rough though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay @joelanford.
I've tried to fake the ActionConfig
but it was more complex than expected, i.e. also interacting with the storage.Storage
interface to call the Update
func for the given release.
I stopped from there and wondered if it fits the abstraction if the MarkFailed
func is renamed to Update
to be more generic, so the ActionClient
wraps all interactions with Helm in a single struct.
Also the already existing fake client can be leveraged and easily extended.
If theUpdate
func does not match the expectations I would implement a fake ActionConfig
with a memory release driver in a separate PR.
I agree. An operator ideally should only reconcile resources which belong to them or which are explicitly labeled to allow it (opt-in to reconcile).
I think adopting the release is reasonable if opted-in. This seems to be outside of the scope of this PR, created an issue for it: #144 |
pkg/reconciler/reconciler.go
Outdated
@@ -796,8 +871,8 @@ func (r *Reconciler) addDefaults(mgr ctrl.Manager, controllerName string) { | |||
r.log = ctrl.Log.WithName("controllers").WithName("Helm") | |||
} | |||
if r.actionClientGetter == nil { | |||
actionConfigGetter := helmclient.NewActionConfigGetter(mgr.GetConfig(), mgr.GetRESTMapper(), r.log) | |||
r.actionClientGetter = helmclient.NewActionClientGetter(actionConfigGetter) | |||
r.actionConfigGetter = helmclient.NewActionConfigGetter(mgr.GetConfig(), mgr.GetRESTMapper(), r.log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved out of the r.actionClientGetter == nil
if block and into its own, right?
if r.actionConfigGetter == nil {
r.actionConfigGetter = helmclient.NewActionConfigGetter(mgr.GetConfig(), mgr.GetRESTMapper(), r.log)
}
if r.actionClientGetter == nil {
r.actionClientGetter = helmclient.NewActionClientGetter(r.actionConfigGetter)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree and done
@joelanford and @SimonBaeumer to pair on this |
@@ -531,6 +544,10 @@ func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (res ctrl. | |||
) | |||
return ctrl.Result{}, err | |||
} | |||
if state == statePending { | |||
return r.handlePending(actionClient, rel, &u, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joelanford To be honest, I don't get to a good solution here and would like that you take a decision here.
1. handlePending in actionClient
My suggestion would be moving the handlePending
to the actionClient
.
This makes sense as it is a Helm state which is handled such the actionClient abstracts Helm interactions.
To include handlePending
in handleReconcile
the statePending
must be checked at a different execution time (within the switch state
in line 559).
2. Moving handlePending to the switch state
As far as I see the only disadvantage is that pre-hooks
are executed before the pending state is resolved. This is not necessarily bad and depends more on our definition of pre-hooks/extensions
functions.
@SimonBaeumer: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Closing as stale. Please re-open if necessary. |
PR to discuss how to recover from pending state.
This is our current implementation used in https://github.com/stackrox/helm-operator
Fixes #94
Currently if the Helm releases exits unexpectedly (i.e. due to node crash or a bug) the pending state of a Helm release is never released and leads to an infinite reconciliation loop.
To automatically resolve pending states the reconciler now takes an option via
WithMarkFailedAfter
which configures a timeout after the pending state is handled as a failure.ToDo