Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argo CD auto-sync probably incorrectly deletes app resources #2573

Closed
2 tasks done
alexmt opened this issue Oct 26, 2019 · 5 comments
Closed
2 tasks done

Argo CD auto-sync probably incorrectly deletes app resources #2573

alexmt opened this issue Oct 26, 2019 · 5 comments
Labels
bug/priority:urgent Bug should be fixed immediately bug/severity:criticial A critical bug in ArgoCD, possibly resulting in data loss or severe degraded overall functionality bug Something isn't working component:core Syncing, diffing, cluster state cache
Milestone

Comments

@alexmt
Copy link
Collaborator

alexmt commented Oct 26, 2019

Checklist:

Describe the bug

It seems like Argo CD auto-sync with resource pruning might delete app resources. Relative slack conversation: https://argoproj.slack.com/archives/CASHNF6MS/p1571850948417500

Hi, We are having issues with Argo, we use Argo to deploy a bunch of apps to a Kubernetes cluster. As part of the re-sync most of the apps running on the cluster get redeployed. Can someone suggest whats the right way to debug this in Argo?

To be more exact, the Argo Application remains, but the child objects (Deployments, Pods, Service, Ingress) get recreated on the next pass. If it helps, Purge is set to true. This usually happens after re-sync

happens a few times every day but we couldn’t figure out why. At one point we just see the Application go into a Missing state and then it tries to recreate everything but we can’t figure out what’s causing it to delete. Something that seems to happen at the same time is:
rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 172.20.195.180:8081: connect: no route to host"

while running with --loglevel debug I noticed something:
When all’s well I see this =>
Sync/0 resource extensions/Ingress:sample/foo-service-0-ingress obj->obj
Right before pruning starts =>
Sync/0 resource extensions/Ingress:sample/foo-service-0-ingress obj->nil

To Reproduce

Still unable to reproduce

Version

v1.2.3
@alexmt alexmt added the bug Something isn't working label Oct 26, 2019
@alexmt alexmt modified the milestones: v1.4, v1.3 Oct 26, 2019
@alexmt alexmt modified the milestones: v1.3, v1.4 Dec 10, 2019
@alexmt alexmt modified the milestones: v1.4, v1.5 Jan 23, 2020
@alexmt alexmt modified the milestones: v1.5, v1.6 Feb 18, 2020
@alexmt alexmt added bug/severity:criticial A critical bug in ArgoCD, possibly resulting in data loss or severe degraded overall functionality bug/priority:urgent Bug should be fixed immediately labels Apr 23, 2020
@jannfis jannfis added the component:core Syncing, diffing, cluster state cache label May 14, 2020
@alexmt alexmt modified the milestones: v1.6 GitOps Engine, v1.7 May 20, 2020
@ash2k
Copy link
Member

ash2k commented Jul 7, 2020

Can this be caused by argoproj/gitops-engine#79 ?

@Kampe
Copy link
Contributor

Kampe commented Jul 15, 2020

I'm seeing this behavior on initial sync of my repository

@darshanime
Copy link
Member

@Kampe can you reproduce it reliably? If so, do share the steps to reproduce.

@Kampe
Copy link
Contributor

Kampe commented Jul 15, 2020

Unsure yet, I will see here soon, however heres the log output

time="2020-07-15T03:27:12Z" level=info msg="appResyncPeriod=3m0s"
time="2020-07-15T03:27:12Z" level=info msg="Application Controller (version: v1.6.1+159674e, built: 2020-06-19T00:41:18Z) starting (namespace: argocd)"
time="2020-07-15T03:27:12Z" level=info msg="Starting configmap/secret informers"
time="2020-07-15T03:27:12Z" level=info msg="Configmap/secret informer synced"
time="2020-07-15T03:27:12Z" level=info msg="Refreshing app status (comparison expired. reconciledAt: never, expiry: 3m0s), level (2)" application=my-application
time="2020-07-15T03:27:12Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: default)" application=my-application
time="2020-07-15T03:27:12Z" level=info msg="Changing cluster cache settings to: {map[] 0xc0008ebda0}" server="https://10.43.0.1:443"
time="2020-07-15T03:27:12Z" level=info msg="Start syncing cluster" server="https://10.43.0.1:443"
time="2020-07-15T03:27:12Z" level=info msg="0xc0007ef1a0 subscribed to settings updates"
time="2020-07-15T03:27:13Z" level=info msg="Cluster successfully synced" server="https://10.43.0.1:443"
time="2020-07-15T03:27:16Z" level=info msg="Normalized app spec: {\"status\":{\"conditions\":[{\"lastTransitionTime\":\"2020-07-15T03:27:12Z\",\"message\":\"rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: Error while dialing dial tcp 10.43.92.55:8081: connect: connection refused\\\"\",\"type\":\"ComparisonError\"}]}}" application=my-application
time="2020-07-15T03:27:16Z" level=info msg="Skipping auto-sync: application status is Unknown" application=my-application
time="2020-07-15T03:27:16Z" level=info msg="Updated sync status:  -> Unknown" application=my-application dest-namespace=default dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal
time="2020-07-15T03:27:16Z" level=info msg="Updated health status:  -> Healthy" application=my-application dest-namespace=default dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal
time="2020-07-15T03:27:16Z" level=info msg="Update successful" application=my-application

It should probably be noted I'm installing argocd with the argocd-operator in which is installed via OLM.

@alexmt alexmt modified the milestones: v1.7 , v1.8 Aug 25, 2020
@alexmt
Copy link
Collaborator Author

alexmt commented Nov 3, 2020

Reproduced this behavior during upgrading argocd. #4423

Issue was fixed by #4708

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug/priority:urgent Bug should be fixed immediately bug/severity:criticial A critical bug in ArgoCD, possibly resulting in data loss or severe degraded overall functionality bug Something isn't working component:core Syncing, diffing, cluster state cache
Projects
None yet
Development

No branches or pull requests

5 participants