Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait and timeout options with Helm-based operator #4102

Closed
jfgosselin opened this issue Oct 26, 2020 · 17 comments
Closed

wait and timeout options with Helm-based operator #4102

jfgosselin opened this issue Oct 26, 2020 · 17 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. language/helm Issue is related to a Helm operator project lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Milestone

Comments

@jfgosselin
Copy link

Feature Request

Describe the problem you need a feature to resolve.

The Helm-based operator does not wait until the Deployments have minimum (Desired minus maxUnavailable) Pods in ready state before marking the release as successful.

Describe the solution you'd like.

Expose a wait and timeout option like the Helm cli and update the CustomResource status accordingly.

--timeout: A value in seconds to wait for Kubernetes commands to complete This defaults to 5m0s
--wait: Waits until all Pods are in a ready state, PVCs are bound, Deployments have minimum (Desired minus maxUnavailable) Pods in ready state and Services have an IP address (and Ingress if a LoadBalancer) before marking the release as successful. It will wait for as long as the --timeout value. If timeout is reached, the release will be marked as FAILED. Note: In scenarios where Deployment has replicas set to 1 and maxUnavailable is not set to 0 as part of rolling update strategy, --wait will return as ready as it has satisfied the minimum Pod in ready condition.

/language helm

@openshift-ci-robot openshift-ci-robot added the language/helm Issue is related to a Helm operator project label Oct 26, 2020
@joelanford
Copy link
Member

This seems like it would be helpful.

The one concern I have is that Kubernetes controllers generally don't block, waiting for certain conditions to exist. Instead, they make their observations and changes, and then wait until those changes trigger another reconcile loop. The loop then repeats until steady state is reached.

If Helm's --wait conditions could be implemented as I describe above, would that meet your use case?

This has me thinking about --timeout as well. If the above can be implemented, does a timeout like this make sense in the context of an always reconciling operator? I'm not sure it does. With the Helm CLI, after the timeout occurs, the Helm binary exits, presumably leaving the incomplete release resources as is. But the operator is always running, so it will just keep retrying, waiting, timing out, etc. the exact same way every time with an exponential backoff between retries.

@jfgosselin
Copy link
Author

@joelanford that will meet my use case. You are right, I don't think the timeout make sense.

@jberkhahn jberkhahn added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Oct 26, 2020
@jberkhahn jberkhahn added this to the Backlog milestone Oct 26, 2020
@joelanford
Copy link
Member

@jfgosselin Cool! Is this something you would be willing to investigate and contribute?

One thing I'm not sure about is whether the Helm libraries expose the same checks that the --wait flag makes without actually having to wait. If it does, it should be possible to incorporate that check into the reconcile loop and possibly tweak or add a status condition.

@drewwells
Copy link

This is closely related to #2491. The operator should mark the CR status Ready only after all the resources are ready by k8s's definition of ready.

In Helm, there is an interface that implements ready checking of multiple resources https://github.com/helm/helm/blob/9b42702a4bced339ff424a78ad68dd6be6e1a80a/pkg/kube/interface.go#L38-L44

The hooks that perform the checks are here
https://github.com/helm/helm/blob/193850a9e2c509acf1a499d98e8d23c12c134f11/pkg/action/hooks.go#L58-L83

@joelanford
Copy link
Member

@drewwells It looks like WatchUntilReady is used only for hook execution, and it also looks like it is always called, even if --wait is not set. So I think the helm-operator is likely already waiting for hooks.

For --wait functionality, we would want to mimic or re-use the KubeClient.Wait() function. Here's the usage in action.Install.

Unfortunately, it looks like the waiter struct and waiter.waitForResources() function are both unexported and they block, so we can't re-use them in the SDK without actually using the action.Install.Wait value on the struct, which I think we should avoid.

Ideally this block of logic would be extracted and exported into a new IsResourceReady(r Resource) (bool, error) function.

@drewwells
Copy link

I agree about the action package, it's very lacking in its flexibility.

You're right about WatchUntilReady. There's another similarly named method that is public and constructs a waiter https://github.com/helm/helm/blob/593fec6868d7a239032cf234b530aee6f5d52708/pkg/kube/client.go#L116-L128 We won't have any errors to show until the Timeout elapses though and possibly no other reconciliations.

What is the objection to using action.Install.Wait flag? I think even using client.Wait here will have limited visibility from within the operator itself.

@drewwells drewwells mentioned this issue Oct 26, 2020
2 tasks
@joelanford
Copy link
Member

@drewwells There are a few reasons I don't think we should use action.Install.Wait.

As I mentioned in my first comment, it is generally an anti-pattern to wait in Kubernetes controllers. The more accepted pattern is to observe the current state of the CR and its resources (e.g. ask is the release ready?), make any changes (e.g. set CR status based on whether the resources are ready), and immediately return. Then if/when any of the status blocks of any of the release resources change, our reconciler will be triggered again, and we can again check if the release is ready. This process repeats until all of the release resource status blocks achieve steady state. If the final steady state is that all of the resources are ready, then the reconciler will set the CR status to reflect that.

With this design the operator can scale to handle a very large number of CRs and their releases being active at the same time.

If the reconciler is allowed to block, the operator will only be able to handle MaxConcurrentReconciles CRs simultaneously. Once that number of reconcilers is waiting, all other events will be queued until an existing CR is finished waiting or it times out.

Another benefit of this design is that the question of whether a release is ready is always checked. If steady state was reached and the release became ready, that doesn't mean it will ALWAYS be ready. If a pod terminates or a service loses its endpoints, the reconciler will be triggered and it will change the CR status from ready to not ready.

drewwells added a commit to infobloxopen/operator-sdk that referenced this issue Oct 27, 2020
drewwells added a commit to infobloxopen/operator-sdk that referenced this issue Oct 27, 2020
@drewwells
Copy link

We can cheat here just like helm-cli does. It marks the release as ready as soon as it becomes available even when that is only true for a short period. Programs that do not have sufficiently long startupProbes will pass as ready then start crash looping, helm does not catch this.

I think here we need to distinguish between operations helm can handle vs trying to micromanage resources of a release.

If a subresource is deleted, the operator could sort out how to bring it back. This can be handled by watching subresources for CUD events https://godoc.org/sigs.k8s.io/controller-runtime/pkg/builder#Builder.Owns.

The other one is I guess is to update status if the release is in bad health. I’m not so certain operator is responsible for this. Most operators I have used act sort of like k8s replicasets. They will do very simple things like recreate pods when deleted and update statuses if things go wrong. They can’t fix a broken pod and don’t try to.

Keeping in this mind, it seems ideal that helm-operator watches all the sub-resources and attempts to make some decisions on them at least until initial ready. It is a nice to have for the continuous watching. I suspect the current behavior which appears to be polling to sync the state over time. If configmap is mutated, requeue is reconciled, lookup resource owner, find the desired state, set the configmap back to expected state. It sounds expensive memory wise to watch this many objects so polling is fine here.

@jfgosselin
Copy link
Author

@jfgosselin Cool! Is this something you would be willing to investigate and contribute?

One thing I'm not sure about is whether the Helm libraries expose the same checks that the --wait flag makes without actually having to wait. If it does, it should be possible to incorporate that check into the reconcile loop and possibly tweak or add a status condition.

@joelanford sorry, I've never coded in Go

@joelanford
Copy link
Member

@jfgosselin No worries. This is not currently a high priority for the core SDK team maintainers, but it is definitely a nice feature, and we would definitely accept a PR.

I think the direction we would want to go is to:

  1. work upstream in the Helm project to expose the release readiness check as an exported function
  2. incorporate a new release of Helm that contains 1, such that the Helm operator reconcile loop performs Helm's readiness checks during every reconciliation without waiting for all the release resources to actually be ready.

If anyone is interested in taking this work on, let us know!

@jfgosselin
Copy link
Author

BTW not sure how they did it in the Flux Helm Operator but I think they support this feature (we didn't use Flux for some other reasons).

@joelanford
Copy link
Member

I didn't look too closely, but it looks like they just allow the .Wait setting to be used directly. If anyone sees this that has some knowledge about this particular aspect of the Flex Helm Operator, I'd be interested in knowing how/if the concurrency and reconciliation problems I mentioned above are handled.

@drewwells
Copy link

Workaround, kubectl wait can look at the status types to block when things happen. It's not currently possible to use it with InstallSuccessful b/c it is listed as a Reason instead of a type. You can get the deploy, then have to filter the output looking for InstallSuccessful

k -n konk get konk example-konk -o jsonpath="{range .status.conditions[*]}{$@.type}={@.reason}{'\n'}{end}"                     stg-1
Initialized=
Deployed=InstallSuccessful

Current

   - lastTransitionTime: "2020-10-27T21:48:35Z"
      message: |
        1. Get the application URL by running these commands:
          export POD_NAME=$(kubectl get pods --namespace konk -l "app.kubernetes.io/name=konk,app.kubernetes.io/instance=example-konk" -o jsonpath="{.it
ems[0].metadata.name}")
          echo "Visit http://127.0.0.1:8080 to use your application"
          kubectl --namespace konk port-forward $POD_NAME 8080:80
      reason: InstallSuccessful
      status: "True"
      type: Deployed

If there was a Status for "DeployedSuccessful", then users could use kubectl to wait on this status to take action.

until kubectl wait --timeout=1s --for=condition=DeployedSuccessful -n konk konk example-konk; do echo "still deploying"; sleep 1s; done

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 26, 2021
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 25, 2021
@openshift-ci-robot openshift-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 25, 2021
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. language/helm Issue is related to a Helm operator project lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

6 participants