-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to deploy ArgoCD with HA #11388
Comments
In my tests, a vanilla HA installation of v2.5.2, or an upgrade to it (v2.5.1 -> v2.5.2 for example) both fails at the
Sounds a little off that the redis-ha-server component is waiting for itself...? |
I'm having the same issue in my namespaced ha install; it seems like the issue is similar to a previous problem with Redis and ipv6. After adding the bind to 0.0.0.0 in the config for sentinel and redis.conf it starts the DB fine, but the HA proxy still shows as 0 masters available, and also, argocd-server is complaining of a timeout against the database. |
I'm also having a similar issue when using ArgoCD HA v2.5.2 all |
If everyone could please provide me a few additional details about your particular cluster setup in your comments. |
Same issue here.
Happening with v2.5.1 and v2.5.2 |
I had the issue in version v2.5.1, and v2.5.2 had to rollback to 2.4.6 where it is working fine. |
I created PR #11418 if you could please test the HA manifest in a dev environment and provide feedback. This will be based on the master branch and is not suitable for production. IPv6 only environments will not be compatible. I will also conduct testing on my side over the next few days. |
My results:
Most of the time Status of failing Pods is
|
I can confirm that this is solved with 2.5.3. Thank you! |
Can also confirm this is fixed for me with 2.5.3 |
I tried @34fathombelow solution. Now the pods are starting, but I still have an issue with Redis: From redis pods:
ha proxy pods start failing but eventually are up:
argocd-server has the following errors all the time:
|
I just found this issue. Trying to upgrade from 2.4.17 to 2.5.5 and I'm running into the original error. Should I just follow this issue and try back when I see it closed, or do you guys need some help testing/validating the fix? Thanks! |
#5957 feels related. We also see the same issue with an IPv4 cluster on a TKG cluster. EDIT: Confirmed, adding |
…oproj#11388 Signed-off-by: rumstead <rjumstead@gmail.com>
…oproj#11388 Signed-off-by: rumstead <rjumstead@gmail.com>
…oproj#11388 Signed-off-by: rumstead <rjumstead@gmail.com>
…1388) (#11862) * fix(redis): explicit bind to redis and sentinel for IPv4 clusters #11388 Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * Retrigger CI pipeline Signed-off-by: rumstead <rjumstead@gmail.com> Signed-off-by: rumstead <rjumstead@gmail.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
…1388) (#11862) * fix(redis): explicit bind to redis and sentinel for IPv4 clusters #11388 Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * Retrigger CI pipeline Signed-off-by: rumstead <rjumstead@gmail.com> Signed-off-by: rumstead <rjumstead@gmail.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
…1388) (#11862) * fix(redis): explicit bind to redis and sentinel for IPv4 clusters #11388 Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * Retrigger CI pipeline Signed-off-by: rumstead <rjumstead@gmail.com> Signed-off-by: rumstead <rjumstead@gmail.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Hi @crenshaw-dev, I just wanted to report that we're still facing the issue with version 2.5.6 and ha setup. We just upgraded our argo dev instance from v2.4.8 to 2.5.6 via Tue Jan 17 09:05:44 UTC 2023 Start... |
I am also experiencing the same issue @FrittenToni describes above. |
…goproj#11388) (argoproj#11862) * fix(redis): explicit bind to redis and sentinel for IPv4 clusters argoproj#11388 Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * Retrigger CI pipeline Signed-off-by: rumstead <rjumstead@gmail.com> Signed-off-by: rumstead <rjumstead@gmail.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Signed-off-by: emirot <emirot.nolan@gmail.com>
Same problem with 2.5.10 on OKD 4.12. The argocd-redis-ha-server startup fine in 2.4.19 buts faitls on 2.5.10 |
Same here. Only 2.5.x version that's working is v2.5.3+0c7de21 |
Same here. failing on 2.5.6, 2.5.10 deployment and 2.6.1 |
Did someone try 2.6.2? |
Just did, same result.
|
not sure if this was anyones problem but for my specific issue, I was scaling the argocd-redis-ha from 3 to 5 but the chart only deploys 3 argocd-redis-ha-announce-services so I had to deploy two additional ones |
I noticed that this issue appeared when we upgraded our cluster to k8s version v1.23 getent hosts cannot resolve anything in cluster.local domain
|
Seems that network policies argocd-redis-ha-proxy-network-policy and argocd-redis-ha-server-network-policy has to be reviewed. After deleting both policies everything started to work. I have checked no other network policy has defined ports for DNS and only the above two have port 53 defined which is incorrect (for Openshift). Changed UPD/TCP ports to 5353 and everything came back to life. |
Nice find @rimasgo! I verified this works for our deployment as well via kustomize changes against v2.6.2.
|
…goproj#11388) (argoproj#11862) * fix(redis): explicit bind to redis and sentinel for IPv4 clusters argoproj#11388 Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * Retrigger CI pipeline Signed-off-by: rumstead <rjumstead@gmail.com> Signed-off-by: rumstead <rjumstead@gmail.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Signed-off-by: schakrad <chakradari.sindhu@gmail.com>
2.6.7 with OKD 4.12.0 (k8s 1.25.0) doesn't seem to work for me either (using this manifest). Similar to @kilian-hu-freiheit, the redis-ha statefulset and deployment pods never spin up. Appears to be a securityContext issue to me but having tried changing a lot of the variables around the securityContext (and granting 'anyuid' to the project) it still doesn't seem to want to boot the redis servers/proxy up. Using 2.4.x works luckily. |
This fixed the problem for us for upgrading 2.4 -> 2.6
|
Stopping by to add where my issue with this symptom came from. It had to do with the Kubernetes networking setup and the assumption with the HA redis setup of IPv4 networking. My cluster was configured in dual stack mode for IPv4 and IPv6. The IPv6 address range was the first in cluster specification, so it is the IP listed in places that don't show all IPs. Effectively if a I suspect also changing the |
Both argocd-redis-ha-server and argocd-redis-ha-haproxy were unable to start in ArgoCD 2.7.10. We were updating from 2.3.12 -> 2.7.10. Services started after removing the NetworkPolicies redis-ha-server
haproxy
|
There are indeed 2 issues:
(this is potentially insecure - but works...). With this ha redis pods are running "fine".
|
This is certainly a big issue, I am running argocd on EKS 1.24. In my argocd module network policies do not exist so I have nothing to delete as well as my cluster is purely ipv4 so there is not solution there as well. |
Here is how I solved my version of this issue. IssueWhen using the redis-ha:
enabled: true the [ALERT] (1) : Binding [/usr/local/etc/haproxy/haproxy.cfg:9] for proxy health_check_http_url: cannot create receiving socket (Address family not supported by protocol) for [:::8888]
[ALERT] (1) : Binding [/usr/local/etc/haproxy/haproxy.cfg:56] for frontend ft_redis_master: cannot create receiving socket (Address family not supported by protocol) for [:::6379]
[ALERT] (1) : Binding [/usr/local/etc/haproxy/haproxy.cfg:77] for frontend stats: cannot create receiving socket (Address family not supported by protocol) for [:::9101]
[ALERT] (1) : [haproxy.main()] Some protocols failed to start their listeners! Exiting. Cause and solutionI am running a Rancher RKE2 on-premise cluster which has IPv4/IPv6 dual-stack networking enabled. However it looks like IPv6 was not correctly enabled or is not correctly configured for the cluster. The In my case it worked to disable this setting by supplying the following redis-ha:
enabled: true
+ haproxy:
+ IPv6:
+ enabled: false |
you found a solution? having same issues |
We see this as well with 2.7.7 |
It works for me. |
We are still having issues with HA setup. We are using v2.10.12+cb6f5ac. If we close one zone, and try to zync in ArgoCD, it is stuck in "waiting to start". No errors in any logs are reported. This is a major issue, because we cannot do anything in our production environment without ArgoCD, because we are running on a hosted Kubernetes, and our only "admin" access is ArgoCD. |
In our case, we had to restart CoreDNS and Cilium agents; after that, the HA worked properly. I hope this helps someone |
Possibly related: Without |
Checklist:
argocd version
.Describe the bug
ArgoCD is unable to deploy correctly with HA. This happens on the namespace of argocd-installation
To Reproduce
Upgrade from 2.4.6 to 2.5.1 or 2.5.2
Expected behavior
ArgoCD is upgraded/deployed successfully
Version
2.5.2 and 2.5.1 (same issue on both versions)
Logs
ha proxy:
redis ha:
repository server:
The text was updated successfully, but these errors were encountered: