Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait and Prune options do not work as expected with Kustomize configMapGenerator #1180

Open
squer-rcapraro opened this issue Jun 10, 2024 · 5 comments

Comments

@squer-rcapraro
Copy link

In this message I refer to old and new version/commits. This is equivalent to pushing a new commit to a repository that is monitored using a GitRepository and a Kustomization Custom Resources as shown below.

We use Kustomize configMapGenerator to generate configmaps, and Flux to deploy them using a Kustomization CR. Our Flux Kustomization CR has both the wait and prune parameters set to true. We use wait to ensure that our deployments are "Ready" (i.e., work nominally) before removing the old version of the application. We have experienced at times that during rollouts of a new version (of a deployment for instance), configmaps referenced in the old version are deleted before the reconciliation of the Kustomization completes.

I assume this is due to pruning happening before the health check validation in the reconciliation loop (specifically, because the wait parameter is evaluated as part of the checkHealth function).

Why is this a problem? Because if the new deployment is unhealthy, and we delete the old configmaps and old pods get restarted, they will fail with a "configmap not found" error. This can result in our cluster going unhealthy during a rollout.

Can you please confirm if my reasoning is correct, if this is expected behaviour, or if we could move the "pruning" part after the health checks?

More info below. Please let me know if I should add more details.

I have not managed to create a test for this yet, but I have run the tests with garbage collection after the health checks (wait) and this does not seem to introduce any regressions.

	// Run the health checks for the last applied resources.
	isNewRevision := !src.GetArtifact().HasRevision(obj.Status.LastAppliedRevision)
	if err := r.checkHealth(ctx,
		resourceManager,
		patcher,
		obj,
		revision,
		isNewRevision,
		drifted,
		changeSet.ToObjMetadataSet()); err != nil {
		conditions.MarkFalse(obj, meta.ReadyCondition, meta.HealthCheckFailedReason, err.Error())
		return err
	}
	// Run garbage collection for stale resources that do not have pruning disabled.
	if _, err := r.prune(ctx, resourceManager, obj, revision, staleObjects); err != nil {
		conditions.MarkFalse(obj, meta.ReadyCondition, meta.PruneFailedReason, err.Error())
		return err
	}

Kustomize configmap generator adds a suffix to all configmaps it templates. For example, this configuration results in a configmap like test-naskvbw #random suffix. When rolling out a new version (commit), the new configmap will have a different suffix (e.g. test-dagsreg)

# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - deploy.yaml

configMapGenerator:
- name: test
  literals: ...

Let's now assume I have this Kustomization CR, which has both wait and prune set to true.

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: test
spec:
  sourceRef:
    kind: GitRepository
    name: myRepo
  path: ...
  prune: true
  wait: true

And that the deployment references the configmap (some parts are omittest for brevity)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  template:
    spec:
      containers:
        - name: test
          env:
            - name: TEST
              valueFrom:
                configMapKeyRef:
                  name: test
                  key: ...

I would expect that the Kustomize Controller (Flux) waits for the health checks (of all reconciled resources, determined by the wait parameter, see docs here) to be green before performing the garbage collection, i.e. removing the old configmap (test-naskvbw in the example).

@stefanprodan
Copy link
Member

we could move the "pruning" part after the health checks?

This is not an option, changing the order would be a major breaking change requiring Flux v3. You could annotate the generated ConfigMaps with kustomize.toolkit.fluxcd.io/prune: Disabled and clean them up manually from some CronJob.

@ulrichwinter
Copy link

Rolling back an unhealthy rollout is a typical use case.
How is this supposed to work in flux v2 if a changed generated configmap is involved??

@stefanprodan
Copy link
Member

You would roll forward instead of backwards, as in commit the fix to Git and Flux will move the deployment to a new revision. A safe way to rollback is provided by Flagger, which does maintain a clone of the Flux-managed deployment and all configmap/secret refs.

@kingdonb
Copy link
Member

kingdonb commented Jun 11, 2024

If you haven't changed both timeout and interval then they are natively set to equal values by default, and this rollback period will not occur. If you're experiencing an inconsistent state during rollbacks, I'd check to make sure those resources have spec.interval == spec.timeout in the Kustomization, and set interval long enough that an SRE can respond in the event of a failed rollout (and do monitor for that condition — though Helm controller won't allow itself to get itself caught in a retry loop by default, it is possible if the operator has set certain configuration. If you are not using Helm but just Deployments directly, then Kustomize controller with default settings will just retry the apply forever, once every interval, and it will not ever do any rollback. This sounds like the configuration you need.)

Careful with setting a longer than default interval as well, as lengthening the interval will equally draw out timeout you can easily tie up the Kustomize controller with concurrency=n stalled reconcilers, the default value of 10m is pretty reasonable given all this. A longer interval can give an SRE more time to respond before the failing deployment is retried, which definitely makes more difference if you are using Helm controller with indefinite retries, probably not your configuration based on what I gathered here.

There are some fiddly things we should have documented better, like this one. I used to tell everyone to set timeout shorter than interval, without regard for some things can work differently when these values are not set at their defaults.

@squer-rcapraro
Copy link
Author

Thanks both for the comments! Indeed, as mentioned in the opening comment, rollback is not really the issue here.

Because if the new deployment is unhealthy, and we delete the old configmaps and old pods get restarted, they will fail with a "configmap not found" error. This can result in our cluster going unhealthy during a rollout.

The main problem is really that, based on the use of Kustomize configmapGenerator and the order of health checks and garbage collection, the configmaps get cleaned up, leaving the environment "potentially unhealthy".

About Flagger:

  • Yes, Flagger solves the problem, but comes with its complexity and overhead. We are in the process of rolling out Flagger, but have some limitations that don't make its use stable yet. Without Flagger (i.e. only Flux), I still believe that Stefan's answer is going to be the solution for us. Also, we wanted to use Flagger to manage deployments that really need a canary rollout, and not all deployments (e.g. deployments without Ingress/HPA).

One more point:

  • This works only if configmaps are referenced by deployments. There are cases in which configmaps are still generated and deployed using Kustomize configmapGenerator but not referenced by deployments (e.g. they are used as configuration objects and monitored by an operator; very niche use case, I admit). In that case, Flagger would not solve the problem of keeping the "old configmaps" in the cluster until health checks are green.

About timeout/interval

We currently have the following setting for the majority of Kustomizations (as far as I understand, this should not have any downsides when using the Kustomize controller)

kind: Kustomization
spec:
  interval: 5m0s
  timeout: 10m0s
  prune: true
  force: true
  wait: true

We also do not have problems atm with kind:HelmRelease

To summarise

I still believe that the fact that garbage collection is executed before the health checks causes issues in the scenarios shown above, and maybe maybe we could fix it in Flux v3 😉, everything else is on our side to solve. Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants