-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The new version of HelmRelease does not trigger the update instantly, it will wait until the timeout expires. #4923
Comments
Similar conversation - #1000 (comment) |
I probably won’t tell you the API version, but here is the operator version.
|
It is not possible to interrupt the reconciler in Flux v2. |
@stefanprodan, Can a HelmRelease with the following annotations interrupt an existing release and start a new one? |
I've felt this issue, in kustomize controller as well. It might be to our benefit to introduce a way for new reconciliations to interrupt a timeout-in-progress, but it's hard to imagine that it wouldn't also be a breaking change. The best recommendation I can make without breaking anything is to set your timeout lower. Do you expect deployments to take a full hour? If not, then why set |
In my case, if I reduce the timeout, I get a release in the failed state. And if I remove the wait, then I can't rely on the release statuses. It is not so important why people set such or other timeouts (there are reasons for this). What is important is that you cannot start a new installation while the Helm controller is waiting for a timeout. =( Even if we assume that you have a smaller timeout, 15 minutes for example. You make an update that has an error and will never be successful, you are forced to wait 15 or any other minutes until you can fix it. It is clear that the smaller the timeout, the less painful it is. But there are cases when a long timeout is necessary. And so each attempt to fix a faulty release will take you a lot of time (and this can be critical). The old Helm operator initiated the installation of a new Helm release immediately, despite the timeouts. |
@kingdonb This has troubled me before. I do think some of us used the timeout of 60m as it is actually mentioned in the docs as a recommendation: https://fluxcd.io/flux/components/kustomize/kustomizations/#working-with-kustomizations Maybe a note can be added to inform users of the tradeoffs in setting it high vs low? :-) |
I'm not sure we are looking at the same doc, this one says: apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: webapp
namespace: apps
spec:
interval: 60m0s # detect drift and undo kubectl edits every hour
wait: true # wait for all applied resources to become ready
timeout: 3m0s # give up waiting after three minutes Additionally it says: retryInterval: 2m0s # retry every two minutes on apply or waiting failures So when the Kustomization times out at 3m or fails early due to error, it will retry after This is the recommended configuration, because otherwise with an interval of
|
Whoops, you are definitely right. I experienced having a hard time with this some time ago, probably because I am not setting the Hmm, I might not understand it fully still:
|
The paradigm is sort of, either set the interval on your Sources low, or instrument them with a So your The Kustomization timeout should be at least long enough for a deployment to complete in normal circumstances, but not more than about 2x that long. The interval at 10m is a good default, but it can be increased (longer) to reduce the load created on the cluster control plane by Kustomization dry runs that happen every interval. If you do that, you better set timeout shorter, because there's rarely any good reason to lengthen the timeout past the default 10m value. If you're in an environment where a rollback can cause a disaster, then you should keep the timeout equal to interval. Usually you can shorten it, and use retryInterval to keep the duration of a rollback when deployment has timed out short. (When interval is equal to timeout, the duration of a rollback is zero, eg. the controller just proceeds directly into another reconcile attempt) |
Thanks for the thorough info here! It makes sense :-) I have used the However, I played around with the settings a bit in my homelab, and found that using the recommended settings while triggering reconciliation with a call to apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
...
spec:
interval: 60m
timeout: 3m
retryInterval: 2m
... This triggers an instant update, and fits well into workflows where the reconciliation is triggered from a pipeline, and not from the interval. Before today I relied on low interval (without setting timeout) values to accomplish the same, but often put me in a situation where either it was inconvenient to wait for the timeout, or deployments failed, as the timeout was too low. So thanks for sharing @kingdonb! |
You can indeed use OCIRepository with Receiver, though I cannot find it documented anywhere, the https://fluxcd.io/flux/components/notification/receivers/#resources I have an example of it here, for use on GitHub (with GHCR): If it's missing from the docs, we should add it (The event to watch is apiVersion: notification.toolkit.fluxcd.io/v1
kind: Receiver
metadata
...
spec:
resources:
- apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: OCIRepository
name: flux-docs
secretRef:
name: flux-docs-webhook
type: github
events:
- "package" Hope this helps! |
Describe the bug
For example, you have HelmRelease with a timeout of 60 minutes (waiting for a job that takes a long time to complete).
You make a change and apply the new version of HelmRelease while the previous installation with a timeout of 60 minutes is still running.
But the Helm controller does not stop the previous installation and will wait until the timeout ends, and only then it will begin installing the new version of HelmRelease.
In the previous version (helm-operator) it interrupted the current installation and started a new one.
Isn't the previous behavior better, when when a new version of HelmRelease appears, the controller stops installing the previous version and starts installing the new one?
Steps to reproduce
Expected behavior
The new release is applied instantly (waiting for the previous one stops, and the new one begins installation)
Screenshots and recordings
No response
OS / Distro
N/A.
Flux version
v2.13.0
Flux check
N/A.
Git provider
No response
Container Registry provider
No response
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: