Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARC Finalizers Cause a deadlock when uninstalling AutoscalingRunnerSet in ArgoCD #3440

Open
4 tasks done
rteeling-evernorth opened this issue Apr 15, 2024 · 16 comments · May be fixed by #3447
Open
4 tasks done

ARC Finalizers Cause a deadlock when uninstalling AutoscalingRunnerSet in ArgoCD #3440

rteeling-evernorth opened this issue Apr 15, 2024 · 16 comments · May be fixed by #3447
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode

Comments

@rteeling-evernorth
Copy link

rteeling-evernorth commented Apr 15, 2024

Checks

Controller Version

0.9.0

Deployment Method

ArgoCD

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Install the ARC AutoscalingRunnerSet as an ArgoCD app
2. Uninstall ArgoCD app

Describe the bug

The argocd app cannot be deleted because Argo tries to delete resources that are normally deleted by the ARC controller and normal deletion is blocked by the actions.github.com/cleanup-protection finalizer
including:

  • AutoscalingRunnerSet
  • kube mode service account
  • kube mode role
  • rs manager role
  • kube mode role binding
  • rs manager role binding

Describe the expected behavior

The chart should cleanly uninstall when i delete the argocd app

Additional Context

I suspect this may be resolvable by helm/argo annotations for the affected resources, I will test with a fork of the helm chart

Controller Logs

My employer's open source contribution policy forbids me from creating public Gists. I will provide redacted logs upon request via zip/tarball.

Runner Pod Logs

My employer's open source contribution policy forbids me from creating public Gists. I will provide redacted logs upon request via zip/tarball.
@rteeling-evernorth rteeling-evernorth added bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode needs triage Requires review from the maintainers labels Apr 15, 2024
@rteeling-evernorth rteeling-evernorth changed the title ARC Finalizers Cause a deadlock when unisntalling AutoscalingRunnerSet ARC Finalizers Cause a deadlock when uninstalling AutoscalingRunnerSet in ArgoCD Apr 15, 2024
@rteeling-evernorth
Copy link
Author

I was successfully able to fix the issue using argocd annotations argocd.argoproj.io/sync-options: Delete=false and argocd.argoproj.io/sync-wave: "1", I'll open a PR

@rteeling-evernorth rteeling-evernorth linked a pull request Apr 16, 2024 that will close this issue
@rteeling-evernorth
Copy link
Author

@nikola-jokic could I trouble you for your two cents on this?

@ahatzz11
Copy link

ahatzz11 commented Apr 23, 2024

@rteeling-evernorth Thanks for making an issue for this - I've run into this issue on nearly every upgrade and end up having to turn off syncing, manually delete the finalizers, and then turn syncing back on and it ends up sorting itself out. I also have noticed that even if a new controller comes up (say 0.9.1 when upgrading from 0.9.0) it doesn't seem like that new controller properly handles the finalizers on the scale-set resources that are on the old version.

I do use sync-waves on the top level resources to ensure the gha-runner-scale-set-controller syncs first and the gha-runner-scale-set-figure-linux-standard syncs second:

metadata:
  name: gha-runner-scale-set-controller
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: "1" # before scale-set

---

metadata:
  name: gha-runner-scale-set-figure-linux-standard
  namespace: argocd
  annotations:
    argocd.argoproj.io/sync-wave: "2" # after controller

Looking at your PR you set argocd.argoproj.io/sync-wave: "1" - would that conflict in my case with my controller having the same sync wave? Would it makes sense to go negative on the sync-wave for the autoscalingrunnerset to always ensure that runs first?

@rteeling-evernorth
Copy link
Author

rteeling-evernorth commented Apr 24, 2024

@ahatzz11 Had the same issue with having to manually resolve the finalizer deadlock. I've installed the controller in an entirely separate ArgoCD app, since the cardinality of the scale set controller to scale sets is 1:N. I assume GitHub is distributing the controller and runner set charts separately because of this. You should separate them into separate ArgoCD apps for the sake of ease of maintenance if nothing else.

Argo Sync waves are only relevant within the context of an Argo App, so as long as the two charts are managed by separate apps, you won't see any conflict. Once this is done you just uninstall any app for the scale set, then uninstall the controller and you won't have any deadlock issues.

Hope that helps!

@ahatzz11
Copy link

@rteeling-evernorth Cool thanks for the explanation - I do have each chart managed as separate apps so we should be good to go there!

@rteeling-evernorth
Copy link
Author

rteeling-evernorth commented Apr 30, 2024

@nikola-jokic

What are your thoughts on this issue and corresponding PR?

@ahatzz11
Copy link

ahatzz11 commented May 6, 2024

@nikola-jokic @Link- Is it possible to get this issue/PR looked at for the next release? It would be awesome to have upgrades massively simplified for argocd users.

@Link-
Copy link
Member

Link- commented May 6, 2024

Thanks for adding the appropriate labels. It's on our radar now.

@rteeling-evernorth
Copy link
Author

@ahatzz11 did my change work for you? I'd prefer to not cause an "It works on my machine" sort of problem.

@nikola-jokic nikola-jokic removed the needs triage Requires review from the maintainers label May 22, 2024
@rteeling-evernorth
Copy link
Author

@Link- @nikola-jokic Just curious - what's going on with this? The associated PR has been sitting in the hopper for a while now, is there anything i can do within reason to help get this released?

@rteeling-evernorth
Copy link
Author

@nikola-jokic @Link- @rentziass

This issue has been sitting stale for a few months now. Would it be possible to get the associated PR on the next release?

@nikola-jokic
Copy link
Contributor

Hey everyone,

After discussing this issue with the team, we decided not to apply annotations at this time. We are aware that many of you are using ArgoCD to manage ARC and scale sets, but we don't have capacity to support it at this time. We would recommend that you maintain your own chart if it requires such modifications until we officially support it. Thank you @rteeling-evernorth, for raising the PR to fix it, and thank you for describing this problem so thoroughly! It will help other people apply changes to their own charts.

@AlexandreODelisle
Copy link

@nikola-jokic, @rteeling-evernorth ,

Since there is no intention no provide the capacity on ArgoCD. Could a possibility be around the support of annotations on all Kubernetes objects managed by the chart?

In the end, if we are able to set the annotations everywhere, it could be handled in the values, without requiring to fork the chart..

@SimonWoidig
Copy link

@nikola-jokic, @rteeling-evernorth ,

Since there is no intention no provide the capacity on ArgoCD. Could a possibility be around the support of annotations on all Kubernetes objects managed by the chart?

In the end, if we are able to set the annotations everywhere, it could be handled in the values, without requiring to fork the chart..

This is the usual solution - giving the choice of annotations to the chart user. By my opinion - best solution for this kind of a problem. 👍

@rteeling-evernorth
Copy link
Author

The problem here is that the argoCD annotations are not applied to everything - because this is ultimately an order of operations issue the ability to selectively apply the annotations to specific resources is needed. I agree a custom annotation value would be useful but it would not solve this issue.

Additionally, given the argo annotations are effectively inert if the resource is not actively being managed by argo, I dont see why this would have any adverse effect if installing via kubectl/helm/flux/whatever else. If we really want to split hairs, i can update PR to put the argo annotations on a feature flag if that would make the maintainers happy. Thoughts @nikola-jokic ?

@lifeofmoo
Copy link

I use Helm + TF to deploy two separate apps into Argo. If I want to destroy the runner set, I get TF timeouts due to finalizers so I have to manually run the following commands to be able to delete the arc-runners Namespace.

I did add the argocd.argoproj.io/sync-options: Delete=false which is able to "remove" the app from argo but my TF seems to hang on the deleting on the namespace - which is created using the CreateNamespace=true sync option.

kubectl patch autoscalingrunnerset arc-runner-set --namespace=arc-runner-set --type=json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'

kubectl patch rolebinding arc-runner-set-gha-rs-manager --namespace=arc-runner-set --type=json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'
 
kubectl patch role arc-runner-set-gha-rs-manager --namespace=arc-runner-set --type=json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'
 
kubectl patch serviceaccount arc-runner-set-gha-rs-no-permission --namespace=arc-runner-set --type=json -p='[{"op": "remove", "path": "/metadata/finalizers"}]'

I could, have TF (not Argo) create the NS, wrap these 4 patches in TF wrapper and then delete the NS as a workaround during a destroy. However I am keen to to see if there is another way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gha-runner-scale-set Related to the gha-runner-scale-set mode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants