Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Helm Release Recreated After HelmRelease Is Garbage Collected #1873

Closed
stevenpall opened this issue Mar 27, 2019 · 10 comments · Fixed by #1906
Closed

Helm Release Recreated After HelmRelease Is Garbage Collected #1873

stevenpall opened this issue Mar 27, 2019 · 10 comments · Fixed by #1906
Assignees

Comments

@stevenpall
Copy link
Contributor

I'm having a strange issue with the new --sync-garbage-collection feature. Basically, when I remove a HelmRelease file from Git and commit it, Flux picks this up properly and deletes the corresponding HelmRelease, but it also appears to create a new Helm release afterwards. This causes there to be orphaned Helm releases left without a corresponding HelmRelease, and thus, this needs to be cleaned up manually. Example log:

Mar 27 09:09:45 flux-helm-operator-6b4c7c767d-nvzcx.flux.derp flux-helm-operator: ts=2019-03-27T16:09:45.191264505Z caller=release.go:147 component=release info="processing release test (as dbbb37c2-50a8-11e9-8ff7-16b5dbb64816)" action=CREATE options="{DryRun:true ReuseName:false}" timeout=300s
Mar 27 09:11:22 flux-5f5c67fb69-jwv7s.flux.derp flux: ts=2019-03-27T16:11:22.154589668Z caller=sync.go:135 component=cluster info="cluster resource not in resources to be synced; deleting" resource=test:helmrelease/test
Mar 27 09:11:22 flux-5f5c67fb69-jwv7s.flux.derp flux: ts=2019-03-27T16:11:22.238287725Z caller=sync.go:483 component=cluster method=Sync cmd="kubectl delete -f -" took=83.580268ms err=null output="helmrelease.flux.weave.works \"test\" deleted"
Mar 27 09:11:23 flux-helm-operator-6b4c7c767d-nvzcx.flux.derp flux-helm-operator: ts=2019-03-27T16:11:23.63301596Z caller=release.go:116 component=release info="Deleting release test"
Mar 27 09:11:25 flux-helm-operator-6b4c7c767d-nvzcx.flux.derp flux-helm-operator: ts=2019-03-27T16:11:25.581917569Z caller=release.go:246 component=release info="Release deleted: [test]"
Mar 27 09:11:58 flux-helm-operator-6b4c7c767d-nvzcx.flux.derp flux-helm-operator: ts=2019-03-27T16:11:58.769620249Z caller=release.go:147 component=release info="processing release test (as test)" action=CREATE options="{DryRun:false ReuseName:false}" timeout=300s
Mar 27 09:12:03 flux-helm-operator-6b4c7c767d-nvzcx.flux.derp flux-helm-operator: ts=2019-03-27T16:12:03.710489868Z caller=chartsync.go:361 component=chartsync warning="could not update the release revision" namespace=test resource=test err="helmreleases.flux.weave.works \"test\" not found"
@hiddeco
Copy link
Member

hiddeco commented Apr 1, 2019

Can you share some details about your chart, HelmRelease and/or setup?

I tried to reproduce it with the stable/redis chart without any luck.

@stevenpall
Copy link
Contributor Author

Sure, so here is an example of the HelmRelease:

apiVersion: flux.weave.works/v1beta1
kind: HelmRelease
metadata:
  annotations:
    flux.weave.works/automated: "true"
    flux.weave.works/tag.chart-image: glob:pr-test-*
  name: pr-test
  namespace: test
spec:
  chart:
    git: ssh://git@github.com/test/flux-configs
    path: charts/test
    ref: master
  releaseName: pr-test
  values:
    aValue:
      enabled: true

The chart is stored in the same Flux config repo and isn't anything out of the ordinary.

These HelmReleases get created and deleted automatically by this tool: https://github.com/stevenpall/github-actions/tree/master/flux-pull-request. It only interacts with the Flux config repo though, so it shouldn't be any different than a user creating/deleting the HelmRelease files in Git.

@stevenpall
Copy link
Contributor Author

stevenpall commented Apr 1, 2019

I just noticed this in the logs:

ts=2019-04-01T17:23:59.091845202Z caller=chartsync.go:556 component=chartsync error="Release pr-test: values have diverged due to manual Chart release"

I'm not sure what would've caused that or if it could be related to this issue. The release was not updated outside of Git for reference.

@hiddeco
Copy link
Member

hiddeco commented Apr 1, 2019

I'm not sure what would've caused that or if it could be related to this issue.

It is possible to log the diff by configuring the --log-release-diffs argument on the Helm operator.

@stevenpall
Copy link
Contributor Author

I tried adding that flag, but I'm not seeing the diffs being logged for some reason. It should just be logged to stdout?

@hiddeco
Copy link
Member

hiddeco commented Apr 1, 2019

Had a fruitful chat with @stevenpall on Slack.

The problem is that we collect all resources in memory and loop over them to do a release for each one, in case you have a lot of releases (31 to be precise in this case), the release will still be scheduled after it has been removed.

The quick fix would be to attempt to resolve the HelmRelease just before we are trying to perform an install and return early in case it does not exist.

The better fix in my opinion would be to queue releases and dequeue a scheduled release when the HelmRelease is removed.

@squaremo / @2opremio, thoughts?

@squaremo
Copy link
Member

squaremo commented Apr 2, 2019

The quick fix would be to attempt to resolve the HelmRelease just before we are trying to perform an install and return early in case it does not exist.

I like the slightly more general form of this, which is to check that nothing has changed, just before attempting the release. That includes whether the HelmRelease has been removed, but also if it's been changed.

The better fix in my opinion would be to queue releases and dequeue a scheduled release when the HelmRelease is removed.

The operator.go has a queue of what has been added/updated/deleted (it calls ReconcileReleaseDef when an addition or update occurs). Driving all releases/deletions through that might be possible. That would simplify things, especially if it also replaced the ticker in chartsync.go, such that the only thing it was actively monitoring was the git mirrors.

@2opremio
Copy link
Contributor

2opremio commented Apr 2, 2019

the release will still be scheduled after it has been removed.

I don't fully understand this. You mean that the release is scheduled by the helm operator? How come we are releasing a garbage-collecting in the iteration? Does it mean we attempt to release when nothing has changed?

@hiddeco
Copy link
Member

hiddeco commented Apr 4, 2019

How come we are releasing a garbage-collecting in the iteration?

Because we request all known HelmReleases and iterate over them one-by-one.

https://github.com/weaveworks/flux/blob/19da0fc97dcf22e67a2a20f487ea1a9717f1b003/integrations/helm/chartsync/chartsync.go#L391-L398

Does it mean we attempt to release when nothing has changed?

No, we just look at each HelmRelease and do a dry-run release to see if anything has changed and we need to do an actual release.

@2opremio
Copy link
Contributor

2opremio commented Apr 5, 2019 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants