Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

Delete resources no longer in git #1442

Merged
merged 24 commits into from
Feb 27, 2019
Merged

Conversation

Timer
Copy link
Contributor

@Timer Timer commented Oct 10, 2018

This pull request adds stack[1] tracking. Stack tracking attaches metadata to every object created by Flux so resources removed from the "Config Repo" can be pruned.

Every object or updated created by Flux gets assigned a label with the stack's name, e.g. (flux.weave.works/stack: default). Additionally, it also gets a checksum which is an aggregate of all resources in that stack: flux.weave.works/stack-checksum: cf23df2207d99a74fbe169e3eba035e633b65d94.

After cluster changes have been applied, a full list of cluster resources are retrieved by the stack label. Any resources who's stack-checksum do not match the most recently applied checksum are pruned.

Current Test Image

You can test this PR by using the following Docker image: timer/flux-with-stacks:feature-stack-tracking-1d481d0.
Outdated.

TODOs

  • Put behind a formal flag, e.g. --experimental-deletion
  • cluster.Cluster's Sync method should probably receive a more appropriate object than
    2x map[string]policy.Update
  • cluster.applyMetadata should be refactored and simplified
  • cluster.ExportByLabel should probably be given a few failure/retry cases to handle; what about when we're limited by RBAC?
  • Skipped resources (v1:ComponentStatus and v1:Endpoints) shouldn't happen at listing level, but probably delete/diff level
  • Should sync.garbageCollect mark the object for deletion and only prune it next run? This could help with any race conditions in the K8s API server (I didn't run into any, but you never know).
  • Logger is broken in sync.Sync so fmt.Printf had to be used. We should figure this out.
  • Tests, tests, tests.
  • flux.weave.works/immortal

Post-merge follow-ups

  • Multi stack support to minimize blast radius
  • Delete K8s resources in Background propagation mode

[1]: an arbitrary collection of k8s resources

@oliviabarrick
Copy link
Contributor

Can you fill out the description a little bit more to describe what this does? I’m not exactly sure what “stack tracking” means or what a “stack” is in this context.

@hiddeco
Copy link
Member

hiddeco commented Oct 11, 2018

@justinbarrick see #738 (comment) and #738 (comment)

@oliviabarrick
Copy link
Contributor

What is the purpose of the checksum? It seems to me that if we just added a label to the resources (any label really would work), you could still sweep for resources that have the label but do not match any defined resource in the repository based on type / namespace / name. AFAIK this is how Helm works.

I can see checksums being useful if we just want to know if we need to update a resource, but I would expect just using kubectl gives something similar to us already.

@Timer Timer force-pushed the feature/stack-tracking branch 3 times, most recently from 8acf3da to 066c2da Compare October 11, 2018 23:50
@squaremo
Copy link
Member

What is the purpose of the checksum? It seems to me that if we just added a label to the resources (any label really would work), you could still sweep for resources that have the label but do not match any defined resource in the repository based on type / namespace / name. AFAIK this is how Helm works.

Yes, true. When you have access to the names of the things that should be there, all you need to know for deletion is whether it was created by flux in the first place. So the checksum is of use only if you are concerned about reapplying manifests unnecessarily.

However: we deliberately reapply manifests even if it looks unnecessary, to revert changes that have been made out-of-band. For example, if someone patches a resource with kubectl behind flux's back, we want to fairly promptly reassert the definition from the source of truth (i.e., git).

We still might want to report on things that look out of date, so I'm inclined to leave the checksums in, even if we don't use them in control flow.

Copy link
Member

@squaremo squaremo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done an initial scan over this, and made some comments on where things seem a bit out of place. It's all up for discussion :-)

cluster/kubernetes/kubernetes.go Outdated Show resolved Hide resolved
cluster/kubernetes/kubernetes.go Outdated Show resolved Hide resolved
sync/sync.go Outdated Show resolved Hide resolved
sync/sync.go Outdated Show resolved Hide resolved
cluster/cluster.go Outdated Show resolved Hide resolved
@squaremo squaremo changed the title [WIP] Stack tracking [WIP] Delete resources no longer in git Oct 19, 2018
@squaremo
Copy link
Member

  • Put behind a formal flag, e.g. --experimental-deletion

I've added a flag --sync-garbage-collection to switch on the deletion part of the syncing.

  • cluster.Cluster's Sync method should probably receive a more appropriate object than
    2x map[string]policy.Update

Resources are collected into named stacks and given a checksum before being passed to cluster.Sync. (The stacks and checksums are passed as part of the SyncDef argument).

  • cluster.applyMetadata should be refactored and simplified

I rewrote it to do the minimal amount of {de,en}coding.

  • cluster.ExportByLabel should probably be given a few failure/retry cases to handle; what about when we're limited by RBAC?

Good question; I've punted on that for the minute.

  • Skipped resources (v1:ComponentStatus and v1:Endpoints) shouldn't happen at listing level, but probably delete/diff level

Anything that wasn't given a stack label in the first place will be left alone. There will need to be additional code to deal with anything that is created by another controller and given the stack label (possibly examining the ownerReferences).

  • Should sync.garbageCollect mark the object for deletion and only prune it next run? This could help with any race conditions in the K8s API server (I didn't run into any, but you never know).

It might make any race conditions less likely to bite, but it wouldn't remove them. In any case, I think once kubectl apply has succeeded, the resource can be considered durably updated.

  • Logger is broken in sync.Sync so fmt.Printf had to be used. We should figure this out.

I've replaced the printfs with logger.Log. What may have been happening is that Log will return an error if its invoked with an odd number of arguments (since it expects label, value pairs); and we ignore the return value, so it fails silently.

  • Tests, tests, tests.

Yup. I need to bring the existing tests up to date; then we'll have to put in new ones that cover the various scenarios of moving, removing, adding, resources.

  • flux.weave.works/immortal

I think flux.weave.works/ignore would suffice for now; but, it may be a requirement for some people that a resource is updated (not ignored) while it is represented in the git repo, but not deleted if it's removed.

@squaremo
Copy link
Member

While working on those commits, I did start wondering why stacks exist. I accept the argument for limiting the "blast radius" of changes (so e.g., every time you change a label somewhere, it doesn't result in every single resource being updated). But why not limit the blast radius to a single resource?

@Timer
Copy link
Contributor Author

Timer commented Oct 24, 2018

Thanks for the extensive updates! Sorry I don't have more time to work on this right now. 😅

A hash per-resource would probably work well. This was originally based on prior art that had the stack concept, more akin to Helm/Charts.
In this case, the user is defining their resources and not having them arbitrarily created so a hash per resource would probably work nicely.

@squaremo
Copy link
Member

squaremo commented Nov 8, 2018

I've rebased this and made sure the tests run and pass. The latter meant losing some coverage -- specifically, testing whether deleting a resource results in the cluster resource being deleted. I'll have to build up the tests in cluster/kubernetes to cover that.

@Timer
Copy link
Contributor Author

Timer commented Nov 17, 2018

Anything I can help with to push this along? :-)

I think I have some free time coming up.

@Timer
Copy link
Contributor Author

Timer commented Nov 17, 2018

Upgraded to this new version in my cluster and it's working great!

@squaremo
Copy link
Member

Anything I can help with to push this along? :-)

I've added some test scaffolding in sync_test.go. This exercises pretty much the bare minimum of the sync (and deletion). What would really be useful is to expand the cases covered, especially the corners. For example, I have no tests checking what happens with ignored resources (in fact I don't think they are accounted for in the code, so we'll need to come up with the semantics as well).

@squaremo
Copy link
Member

Idea: use a sanitised git repo URL as the stack name. That means that when flux supports multiple repos, it will be able to do syncs piece-wise, a repo at a time, and not delete something that simply came from another repo.

Technically, we can partition and name the stacks whatever we like. But it might be nice to have a label saying which repo a resource came from, for other purposes. And, it's useful for the stack to be a less than or equal to the unit of syncing; otherwise, there might be glitches where, say, a resource gets deleted because it has changed stack but that hasn't been applied yet. (Syncing per repo will have this issue, if a resource moves from one repo to another -- but since repos are updated independently, so will syncing all repos at once).

@squaremo
Copy link
Member

@Timer I have a rebased branch that squashes the commits you marked as fix: ... into their predecessors. Mind if I force-push it?

@Timer
Copy link
Contributor Author

Timer commented Nov 27, 2018

Feel free to force push anything you'd like. :-)

@squaremo squaremo force-pushed the feature/stack-tracking branch 2 times, most recently from 1d629b0 to 9aa11a1 Compare December 20, 2018 14:06
@squaremo
Copy link
Member

squaremo commented Dec 20, 2018

I think this is ready for release as an experimental feature (i.e., can be reviewed and merged).

To address the TODOs:

  • Put behind a formal flag, e.g. --experimental-deletion
  • cluster.Cluster's Sync method should probably receive a more appropriate object than 2x map[string]policy.Update
  • cluster.applyMetadata should be refactored and simplified
  • Should sync.garbageCollect mark the object for deletion and only prune it next run? This could help with any race conditions in the K8s API server (I didn't run into any, but you never know).
  • Logger is broken in sync.Sync so fmt.Printf had to be used. We should figure this out.
  • Tests, tests, tests.
  • flux.weave.works/immortal

The annotation "flux.weave.works/ignore" will effectively make something immortal -- flux won't apply it, and (if on a resource in the cluster) won't delete it, even if it's removed from the repo.
These are explained in comments above (or commit comments).

 - cluster.ExportByLabel should probably be given a few failure/retry cases to handle; what about when we're limited by RBAC?

Accounting for the possibilities is a bit mind-boggling. I think it might be easier to release this as an experimental feature, and try it in the real environments (by which I mean disposable environments).

 - Skipped resources (v1:ComponentStatus and v1:Endpoints) shouldn't happen at listing level, but probably delete/diff level

I'm not convinced we have to skip any resources at all, for correctness. We can generalise or embellish this later if necessary, though.

@squaremo squaremo changed the title [WIP] Delete resources no longer in git Delete resources no longer in git Dec 20, 2018
@squaremo
Copy link
Member

squaremo commented Jan 2, 2019

@rndstr Do you mind having a look at this? I won't hold you to a definitive approval, unless you want to give one :-)

@rndstr
Copy link
Contributor

rndstr commented Jan 3, 2019

@squaremo i likely won't be able to give it a proper look before next Tuesday

sync/sync.go Outdated
logger.Log("resource", res.ResourceID(), "ignore", "apply")
return
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: noop

sync/sync.go Outdated Show resolved Hide resolved
cluster/kubernetes/sync.go Show resolved Hide resolved
Copy link
Contributor

@rndstr rndstr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

played around with it some more locally and it works great.

@squaremo
Copy link
Member

Brilliant, thanks for the review Roli! I appreciate you taking the time to check this out :-)

I had another play myself and I have noticed something: resources that aren't namespaced will get deleted even if they are among the files. The cause is that when loaded from files, resources get given the "default" namespace if they don't specify one. But when loaded from the cluster, they will either have a namespace (because they've been created in a namespace implicitly) or have an empty "" namespace. Since we identify resources using the namespace (and the kind and the name), when the garbage collection goes to look at which resources are in the cluster but not in the files, it will think they are different, and delete them.

Assigning the namespace "default" to resources missing a namespace is wrong, since they end being in another namespace; but so is leaving that field empty, or giving them a sentinel value (e.g., <default>, for the same reason. Finding out the default namespace in the kubectl config and using that will have a similar problem for un-namespaced resources -- they will come back from the cluster with an empty namespace.

I don't think there's anything for it other than to figure out which resources are supposed to have a namespace, and filling in the default value. Ideally we'd be able to do that in a place we're already querying for resources.

@2opremio 2opremio force-pushed the feature/stack-tracking branch 2 times, most recently from f11b18c to 1d86d57 Compare February 26, 2019 14:34
@squaremo squaremo mentioned this pull request Feb 26, 2019
@2opremio 2opremio force-pushed the feature/stack-tracking branch 2 times, most recently from 08b4ff5 to a6f0142 Compare February 26, 2019 16:41
@2opremio
Copy link
Contributor

I think all the comments have been addressed. The only missing thing is documentation.

site/faq.md Outdated Show resolved Hide resolved
@squaremo squaremo merged commit d76ecab into fluxcd:master Feb 27, 2019
@theduke
Copy link

theduke commented Mar 6, 2019

Just wanted to give a shoutout and thanks to @Timer and @squaremo , implementation of this feature is much appreciated!

squaremo added a commit that referenced this pull request Mar 21, 2019
The PR #1442 introduce code to determine which namespace, if any, each
manifest belongs to. To distinguish between resources that need a
namespace but don't have one, and resources that are cluster-scoped,
it introduced the sentinel value `<cluster>` for the latter.

Regrettably, I didn't accompany this with code for _parsing_ those
sentinel values, since I reasoned that it would only be used
internally. But the sync events generated by fluxd include a list of
changed resources, and those inevitably will include things like
namespaces that are cluster-scoped. The result is that fluxd will
generate events that cannot then be parsed by the receiver.

This commit fixes that by recognising `<cluster>` as a namespace when
parsing resource IDs.
2opremio added a commit to 2opremio/flux that referenced this pull request Apr 4, 2019
This fixes a regression introduced by fluxcd#1442 in which applying the `tag_all`
pseudo policy failed for workload manifests in which the namespace is ommited.

The effective namespace of the workload wasn't set after parsing, causing a
mismatch in the resouce identifier (the parsed resource indentifier was
cluster-scoped due to the lack of explicit namespace in the manifest).
2opremio added a commit to 2opremio/flux that referenced this pull request Apr 4, 2019
This fixes a regression introduced by fluxcd#1442 in which applying the `tag_all`
pseudo policy failed for workload manifests in which the namespace is omitted.

The effective namespace of the workload wasn't set after parsing, causing a
mismatch in the resource identifier (the parsed resource identifier was
cluster-scoped due to the lack of explicit namespace in the manifest).
hiddeco pushed a commit that referenced this pull request Apr 4, 2019
This fixes a regression introduced by #1442 in which applying the `tag_all`
pseudo policy failed for workload manifests in which the namespace is omitted.

The effective namespace of the workload wasn't set after parsing, causing a
mismatch in the resource identifier (the parsed resource identifier was
cluster-scoped due to the lack of explicit namespace in the manifest).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants