Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Disable garbage collection for specific resource #2841

Closed
phillebaba opened this issue Feb 11, 2020 · 11 comments
Closed

Disable garbage collection for specific resource #2841

phillebaba opened this issue Feb 11, 2020 · 11 comments

Comments

@phillebaba
Copy link
Member

Describe the feature
Garbage collection works great but there are sometimes scenarios where it could be very inconvenient if a resource was accidentally deleted. This is especially for resources that map to external dependencies such as LBs or other cloud resources create by for example aws-servicebroker. In a perfect world this would not happen but the reality is that mistakes happen. Especially as the amount of manifests grows and large amounts of resources can be removed from being generated by small line changes. So it would be nice to be able to selectively disable garbage collection for certain resources.

What would the new user story look like?

  1. Provide a new annotation that could be added to any resource. It could look like fluxcd.io/garbage-collection=false or fluxcd.io/gc=false indicating that the resource should not be garbage collected.
  2. User starts up Flux using the --sync-garbage-collection flag
  3. New manifest is created with the new annotation and committed to git
  4. Flux syncs with git repository deploys the resource
  5. Manifest is removed and committed to git
  6. Flux skips garbage collection as the resource

Expected behavior
When the resource is removed from git and is marked as an orphaned resource by flux during the sync process it should be skipped by the garbage collector.

@phillebaba phillebaba added blocked-needs-validation Issue is waiting to be validated before we can proceed enhancement labels Feb 11, 2020
@squaremo squaremo removed the blocked-needs-validation Issue is waiting to be validated before we can proceed label Feb 12, 2020
@squaremo
Copy link
Member

squaremo commented Feb 12, 2020

I can see the use case for this -- well explained.

There's an annotation fluxcd.io/ignore that tells flux to ignore a resource with respect to syncing. It's explained here: https://docs.fluxcd.io/en/1.18.0/faq.html#can-i-temporarily-make-flux-ignore-a-manifest. I think you are describing something different, but let's check: how would fluxcd.io/gc=false interact with fluxcd.io/ignore=true?

@phillebaba
Copy link
Member Author

I had a look at the ingore annotation also but it has the downside that any changes would not be synced. Using the ignore annotation would be a two step process. You would first have to commit your manifests without the annotations, and wait for flux to sync them to the cluster. Then add the annotation and commit once again. That would disable any changes in the future, while I just want to stop the GC from removing the resource, but still allowing it to be updated.

Seems like it would be feasible to implement here where the sync checks if dry run is enabled or not, so it would also skip resources with the annotation.

What I am not sure about though is that long term effects of this logic. For example if a resource is removed from git, not GCd and then the git is reverted. Would Flux "recover" the resource or would it conflict when trying to create a new resource with the same name.

@squaremo
Copy link
Member

What I am not sure about though is that long term effects of this logic. For example if a resource is removed from git, not GCd and then the git is reverted. Would Flux "recover" the resource or would it conflict when trying to create a new resource with the same name.

Good question! What do you reckon would happen, going through it step by step? (you already found the relevant code :-) )

@phillebaba
Copy link
Member Author

I did some quick tests and it turns out that the ignore annotation will not stop the GC from deleting ignored resources. The ignore check will only stop the resource from being staged in the changeSet. But the garbage collector is given the whole SyncSet, and will remove any resource that has a checksum but is not in the SyncSet.

Thinking about it a bit more there should not be any issues. Flux would just update the resource if it was added to the cluster again just like it behaves with GC turned off. I will implement the annotation and see how well it works.

@2opremio
Copy link
Contributor

2opremio commented Feb 12, 2020

Just add the ignore annotation to the resource in the cluster (not to the manifest you delete)

@2opremio
Copy link
Contributor

2opremio commented Feb 12, 2020

@alexmt I guess this isn't currently supported by the GitOps Engine?

@phillebaba
Copy link
Member Author

So I tried out what you suggested and the resource is still being deleted. I might be doing something wrong, but looking at the code it seems like the apply and the GC resources are fetched in different manners.

I tried implementing my idea with an additional annotation if you want to have a look.
phillebaba@14b5c0e

@squaremo
Copy link
Member

So I tried out what you suggested and the resource is still being deleted.

Well we are both being misdirected, since I just tried it and had the same result. Method:

  • set flux up pointed at a git repo
  • fluxctl sync
  • use kubectl annotate ... fluxcd.io/ignore="true" to annotate a resource that's in the git repo
  • remove the manifest for the resource from the git repo
  • fluxctl sync
  • is the resource still there? (and what does the fluxd log say it did?)

There's a test that claims to verify that it works: https://github.com/fluxcd/flux/blob/master/pkg/cluster/kubernetes/sync_test.go#L706

So the first thing would be to figure out why that test does not in fact verify the intended behaviour. What do you reckon @phillebaba ?

(There is still room for the suggested feature; but, we need to get the house in order!)

@phillebaba
Copy link
Member Author

phillebaba commented Feb 18, 2020

I had a quick look at the test and the problem isn't the test, it is actually testing what it claims to be testing. The important part of the test description is "pre-existing", meaning resources not created by flux.

If you do the following you would get the same result as the test:

  • set flux up pointed at a git repo
  • create a manifest file and manually apply it to the cluster
  • annotate the resource with fluxcd.io/ignore="true"
  • commit the same manifest to the git repo flux is pointing to
  • fluxctl sync
  • remove the file from the repo and commit the changes

The result will be that flux will never touch the resource, no updates will be applied and neither will it be deleted. The reason is that it is never tracked by flux as being synced so it will no be included in the resources that are garbage collected. If the ignore annotation is removed and then added back flux will modify the resource and track it, allowing the resource to be garbage collected.

Either we accept this behavior as expected as it is already tested. Or we change the behavior so that it also applies to ignored resources that have been created by flux.

@squaremo
Copy link
Member

Looking back at the original garbage collection PR #1442, I see that some idiot (oh it's me) argued for "ignore" meaning "don't garbage collect", then failed to make that actually how it works. I must have a blind spot on this particular thing.

OK so there are two things at stake:

  • "ignore" should imply "don't garbage collect"
  • there is a need for an annotation meaning "update, but don't garbage collect"

@phillebaba I reckon put your work to date in a PR, and we can transfer discussion to there.

@stefanprodan
Copy link
Member

Fixed by #2858

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants