Unable to deploy ArgoCD with HA #11388

Akinorev · 2022-11-21T16:48:30Z

Checklist:

I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
I've included steps to reproduce the bug.
I've pasted the output of argocd version.

Describe the bug

ArgoCD is unable to deploy correctly with HA. This happens on the namespace of argocd-installation

To Reproduce

Upgrade from 2.4.6 to 2.5.1 or 2.5.2

Expected behavior

ArgoCD is upgraded/deployed successfully

Version

2.5.2 and 2.5.1 (same issue on both versions)

Logs

ha proxy:

[ALERT]    (1) : Binding [/usr/local/etc/haproxy/haproxy.cfg:9] for proxy health_check_http_url: cannot create receiving socket (Address family not supported by protocol) for [:::8888]
[ALERT]    (1) : Binding [/usr/local/etc/haproxy/haproxy.cfg:56] for frontend ft_redis_master: cannot create receiving socket (Address family not supported by protocol) for [:::6379]
[ALERT]    (1) : [haproxy.main()] Some protocols failed to start their listeners! Exiting.

redis ha:

21 Nov 2022 16:22:36.369 # Configuration loaded
21 Nov 2022 16:22:36.370 * monotonic clock: POSIX clock_gettime
21 Nov 2022 16:22:36.377 # Warning: Could not create server TCP listening socket ::*:6379: unable to bind socket, errno: 97
21 Nov 2022 16:22:36.378 * Running mode=standalone, port=6379.
21 Nov 2022 16:22:36.378 # Server initialized
21 Nov 2022 16:22:36.379 * Ready to accept connections

repository server:

time="2022-11-21T16:25:46Z" level=info msg="ArgoCD Repository Server is starting" built="2022-11-07T16:42:47Z" commit=148d8da7a996f6c9f4d102fdd8e688c2ff3fd8c7 port=8081 version=v2.5.2+148d8da
time="2022-11-21T16:25:46Z" level=info msg="Generating self-signed TLS certificate for this session"
time="2022-11-21T16:25:46Z" level=info msg="Initializing GnuPG keyring at /app/config/gpg/keys"
time="2022-11-21T16:25:46Z" level=info msg="gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe238040569" dir= execID=9e8d3
time="2022-11-21T16:25:52Z" level=error msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe238040569` failed exit status 2" execID=9e8d3
time="2022-11-21T16:25:52Z" level=info msg=Trace args="[gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe238040569]" dir= operation_name="exec gpg" time_ms=6031.865355
time="2022-11-21T16:25:52Z" level=fatal msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe238040569` failed exit status 2"

The text was updated successfully, but these errors were encountered:

makeittotop · 2022-11-22T04:36:25Z

In my tests, a vanilla HA installation of v2.5.2, or an upgrade to it (v2.5.1 -> v2.5.2 for example) both fails at the redis-ha-server sts component.

# kubectl get pods
NAME                                                READY   STATUS     RESTARTS      AGE
argocd-redis-ha-haproxy-755db98494-pnkbq            1/1     Running    0             14m
argocd-redis-ha-haproxy-755db98494-q5tmw            1/1     Running    0             14m
argocd-redis-ha-haproxy-755db98494-hjj29            1/1     Running    0             14m
argocd-redis-ha-server-0                            3/3     Running    0             14m
argocd-redis-ha-server-1                            3/3     Running    0             13m
argocd-redis-ha-haproxy-5b8f6b7fdd-7q7gh            0/1     Pending    0             3m7s
argocd-applicationset-controller-57bfc6fdb8-phstq   1/1     Running    0             3m7s
argocd-server-6f4c7b9859-dlln8                      1/1     Running    0             3m6s
argocd-notifications-controller-954b6b785-jwwg8     1/1     Running    0             3m2s
argocd-repo-server-569dc6f989-xgnnw                 1/1     Running    0             3m6s
argocd-dex-server-866c9bdd5b-rxb8x                  1/1     Running    0             3m7s
argocd-server-6f4c7b9859-twn6w                      1/1     Running    0             3m1s
argocd-application-controller-0                     1/1     Running    0             3m2s
argocd-repo-server-569dc6f989-h478x                 1/1     Running    0             2m56s
argocd-redis-ha-server-2                            0/3     Init:0/1   1 (32s ago)   2m4s


# kubectl logs argocd-redis-ha-server-2 -c config-init
Tue Nov 22 04:30:41 UTC 2022 Start...
Initializing config..
Copying default redis config..
  to '/data/conf/redis.conf'
Copying default sentinel config..
  to '/data/conf/sentinel.conf'
Identifying redis master (get-master-addr-by-name)..
  using sentinel (argocd-redis-ha), sentinel group name (argocd)
Could not connect to Redis at argocd-redis-ha:26379: Try again
Could not connect to Redis at argocd-redis-ha:26379: Try again

Sounds a little off that the redis-ha-server component is waiting for itself...?

acartag7 · 2022-11-23T08:23:00Z

I'm having the same issue in my namespaced ha install; it seems like the issue is similar to a previous problem with Redis and ipv6. After adding the bind to 0.0.0.0 in the config for sentinel and redis.conf it starts the DB fine, but the HA proxy still shows as 0 masters available, and also, argocd-server is complaining of a timeout against the database.

ghost · 2022-11-23T12:11:43Z

I'm also having a similar issue when using ArgoCD HA v2.5.2 all argocd-redis-ha-haproxy pods go into Init:CrashLoopBackOff. I'm running on a GKE cluster version 1.23.11-gke.300. Downgrading to ArgoCD HA v2.4.17 fixed it for me. I can provide more information about my setup if useful.

34fathombelow · 2022-11-23T17:05:45Z

If everyone could please provide me a few additional details about your particular cluster setup in your comments.
Cluster type? Eg. GKE, AWS, Azure, Digital Ocean?
CNI your are using?
Kubernetes version
IP family? IPv4, IPv6, dual stack, or IPv6 disabled
Are you using a service mesh?

otherguy · 2022-11-24T13:37:30Z

Same issue here.

Cluster type: GKE
Kubernetes version: 1.24 (v1.24.4-gke.800)
IP family: IPv4
Service mesh: no

Happening with v2.5.1 and v2.5.2

acartag7 · 2022-11-24T14:12:20Z

I had the issue in version v2.5.1, and v2.5.2 had to rollback to 2.4.6 where it is working fine.
Cluster type: TKG-based Cluster
CNI: Antrea
Kubernetes: 1.19.9
IP family: IPv6 disabled
Are you using a service mesh: no

34fathombelow · 2022-11-24T19:11:23Z

I created PR #11418 if you could please test the HA manifest in a dev environment and provide feedback. This will be based on the master branch and is not suitable for production. IPv6 only environments will not be compatible.

I will also conduct testing on my side over the next few days.

Glutamat42 · 2022-11-26T14:26:28Z

Provider: DigitalOcean
Kubernetes Version: 1.24.4-do.0
Default settings (not sure if ipv6 is enabled, can't find an option for it)
No service mesh deployed (only ArgoCD deployed to cluster)

My results:

v2.5.2 non HA: All pods are starting
v2.4.17 HA: All pods are starting
v2.5.2 HA: Redis not starting

argocd-redis-ha-haproxy-59b5d8568b-kcvz6           0/1     Init:Error              2 (2m25s ago)   6m41s
argocd-redis-ha-haproxy-59b5d8568b-pbpjf           0/1     Init:CrashLoopBackOff   2 (17s ago)     6m41s
argocd-redis-ha-haproxy-59b5d8568b-ssnmq           0/1     Init:CrashLoopBackOff   2 (20s ago)     6m41s
argocd-redis-ha-server-0                           0/3     Init:Error              3 (2m2s ago)    6m41s

# logs argocd-redis-ha-server-0 -n argocd -c config-init
Sat Nov 26 14:20:03 UTC 2022 Start...
Initializing config..
Copying default redis config..
  to '/data/conf/redis.conf'
Copying default sentinel config..
  to '/data/conf/sentinel.conf'
Identifying redis master (get-master-addr-by-name)..
  using sentinel (argocd-redis-ha), sentinel group name (argocd)
Could not connect to Redis at argocd-redis-ha:26379: Try again
Could not connect to Redis at argocd-redis-ha:26379: Try again

# logs argocd-redis-ha-haproxy-59b5d8568b-kcvz6 -n argocd -c config-init
Waiting for service argocd-redis-ha-announce-0 to be ready (1) ...
Waiting for service argocd-redis-ha-announce-0 to be ready (2) ...
Waiting for service argocd-redis-ha-announce-0 to be ready (3) ...
...

Most of the time Status of failing Pods is Init:0/1

@34fathombelow All pods are starting
v2.5.1: All pods are starting

otherguy · 2022-11-30T10:22:35Z

I can confirm that this is solved with 2.5.3.

Thank you!

Glutamat42 · 2022-12-01T01:21:02Z

Can also confirm this is fixed for me with 2.5.3
Thanks :)

acartag7 · 2022-12-01T11:38:14Z

I tried @34fathombelow solution. Now the pods are starting, but I still have an issue with Redis:

From redis pods:

1:C 01 Dec 2022 11:07:19.788 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 01 Dec 2022 11:07:19.788 # Redis version=7.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 01 Dec 2022 11:07:19.788 # Configuration loaded
1:M 01 Dec 2022 11:07:19.789 * monotonic clock: POSIX clock_gettime
1:M 01 Dec 2022 11:07:19.792 # Warning: Could not create server TCP listening socket ::*:6379: unable to bind socket, errno: 97
1:M 01 Dec 2022 11:07:19.793 * Running mode=standalone, port=6379.
1:M 01 Dec 2022 11:07:19.793 # Server initialized
1:M 01 Dec 2022 11:07:19.794 * Ready to accept connections

ha proxy pods start failing but eventually are up:

[WARNING] (7) : Server bk_redis_master/R0 is DOWN, reason: Layer4 timeout, info: " at step 1 of tcp-check (connect)", check duration: 3001ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] (7) : Server bk_redis_master/R1 is DOWN, reason: Layer4 timeout, info: " at step 1 of tcp-check (connect)", check duration: 3001ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] (7) : Server bk_redis_master/R2 is DOWN, reason: Layer4 timeout, info: " at step 1 of tcp-check (connect)", check duration: 3001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] (7) : backend 'bk_redis_master' has no server available!
[WARNING] (7) : Server bk_redis_master/R0 is UP, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 7ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] (7) : Server check_if_redis_is_master_0/R0 is UP, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 3ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] (7) : Server check_if_redis_is_master_0/R1 is UP, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 3ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] (7) : Server check_if_redis_is_master_0/R2 is UP, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 3ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.

argocd-server has the following errors all the time:

redis: 2022/12/01 11:07:35 pubsub.go:159: redis: discarding bad PubSub connection: EOF
redis: 2022/12/01 11:07:35 pubsub.go:159: redis: discarding bad PubSub connection: EOF
redis: 2022/12/01 11:07:35 pubsub.go:159: redis: discarding bad PubSub connection: EOF
redis: 2022/12/01 11:07:35 pubsub.go:159: redis: discarding bad PubSub connection: EOF
redis: 2022/12/01 11:07:35 pubsub.go:159: redis: discarding bad PubSub connection: EOF

kelly-brown · 2022-12-19T17:30:36Z

I just found this issue. Trying to upgrade from 2.4.17 to 2.5.5 and I'm running into the original error. Should I just follow this issue and try back when I see it closed, or do you guys need some help testing/validating the fix?

Thanks!

rumstead · 2022-12-29T19:31:28Z

#5957 feels related. We also see the same issue with an IPv4 cluster on a TKG cluster.

EDIT: Confirmed, adding bind 0.0.0.0 to redis and sentinel fixed the issue.

…oproj#11388 Signed-off-by: rumstead <rjumstead@gmail.com>

…1388) (#11862) * fix(redis): explicit bind to redis and sentinel for IPv4 clusters #11388 Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * Retrigger CI pipeline Signed-off-by: rumstead <rjumstead@gmail.com> Signed-off-by: rumstead <rjumstead@gmail.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

FrittenToni · 2023-01-17T09:20:17Z

Hi @crenshaw-dev,

I just wanted to report that we're still facing the issue with version 2.5.6 and ha setup. We just upgraded our argo dev instance from v2.4.8 to 2.5.6 via kubectl apply -n argocd-dev -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.5.6/manifests/ha/install.yaml and now our argocd-redis-ha-server-0 pod is no longer coming up due to:

Tue Jan 17 09:05:44 UTC 2023 Start...
Initializing config..
Copying default redis config..
to '/data/conf/redis.conf'
Copying default sentinel config..
to '/data/conf/sentinel.conf'
Identifying redis master (get-master-addr-by-name)..
using sentinel (argocd-redis-ha), sentinel group name (argocd)
Could not connect to Redis at argocd-redis-ha:26379: Try again
Could not connect to Redis at argocd-redis-ha:26379: Try again
Could not connect to Redis at argocd-redis-ha:26379: Try again
Tue Jan 17 09:06:59 UTC 2023 Did not find redis master ()
Identify announce ip for this pod..
using (argocd-redis-ha-announce-0) or (argocd-redis-ha-server-0)
identified announce ()
/readonly-config/init.sh: line 239: Error: Could not resolve the announce ip for this pod.: not found
Stream closed EOF for argocd-dev/argocd-redis-ha-server-0 (config-init)

seanmmills · 2023-01-21T18:58:23Z

I am also experiencing the same issue @FrittenToni describes above. argocd-redis-ha-server starts up fine in 2.4.19, but fails on 2.5.5, 2.5.6, and 2.5.7.

…goproj#11388) (argoproj#11862) * fix(redis): explicit bind to redis and sentinel for IPv4 clusters argoproj#11388 Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * Retrigger CI pipeline Signed-off-by: rumstead <rjumstead@gmail.com> Signed-off-by: rumstead <rjumstead@gmail.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Signed-off-by: emirot <emirot.nolan@gmail.com>

jas01 · 2023-02-06T09:47:00Z

Same problem with 2.5.10 on OKD 4.12. The argocd-redis-ha-server startup fine in 2.4.19 buts faitls on 2.5.10

otherguy · 2023-02-06T10:07:04Z

Same here. Only 2.5.x version that's working is v2.5.3+0c7de21

johnoct-au · 2023-02-13T19:17:02Z

Same here. failing on 2.5.6, 2.5.10 deployment and 2.6.1

otherguy · 2023-02-22T13:22:42Z

Did someone try 2.6.2?

jas01 · 2023-02-22T16:35:48Z

Did someone try 2.6.2?

Just did, same result.

pod/argocd-redis-ha-haproxy-c85b7ffd6-kh56p             0/1     Init:CrashLoopBackOff   18 (4m59s ago)   110m
pod/argocd-redis-ha-haproxy-c85b7ffd6-lsbmj             0/1     Init:0/1                19 (5m21s ago)   110m
pod/argocd-redis-ha-haproxy-c85b7ffd6-qktcv             0/1     Init:0/1                19 (5m9s ago)    110m
pod/argocd-redis-ha-server-0                            0/3     Init:CrashLoopBackOff   20 (3m39s ago)   110m

johnoct-au · 2023-02-22T18:25:49Z

not sure if this was anyones problem but for my specific issue, I was scaling the argocd-redis-ha from 3 to 5 but the chart only deploys 3 argocd-redis-ha-announce-services so I had to deploy two additional ones

rimasgo · 2023-02-23T12:48:30Z

I noticed that this issue appeared when we upgraded our cluster to k8s version v1.23

getent hosts cannot resolve anything in cluster.local domain

$ time oc exec argocd-redis-ha-server-0 -c config-init -- getent hosts argocd-redis-ha
command terminated with exit code 2

real    0m10.273s
user    0m0.121s
sys     0m0.036s

$ time oc exec argocd-application-controller-0 -- getent hosts argocd-redis-ha
172.30.122.223  argocd-redis-ha.argocd.svc.cluster.local

real    0m0.273s
user    0m0.120s
sys     0m0.040s

rimasgo · 2023-02-23T13:26:13Z

Seems that network policies argocd-redis-ha-proxy-network-policy and argocd-redis-ha-server-network-policy has to be reviewed. After deleting both policies everything started to work.

I have checked no other network policy has defined ports for DNS and only the above two have port 53 defined which is incorrect (for Openshift). Changed UPD/TCP ports to 5353 and everything came back to life.

seanmmills · 2023-02-23T15:33:31Z

Seems that network policies argocd-redis-ha-proxy-network-policy and argocd-redis-ha-server-network-policy has to be reviewed. After deleting both policies everything started to work.

I have checked no other network policy has defined ports for DNS and only the above two have port 53 defined which is incorrect (for Openshift). Changed UPD/TCP ports to 5353 and everything came back to life.

Nice find @rimasgo! I verified this works for our deployment as well via kustomize changes against v2.6.2.

- patch: |-
    - op: replace
      path: /spec/egress/1/ports/0/port
      value: 5353
    - op: replace
      path: /spec/egress/1/ports/1/port
      value: 5353
  target:
    kind: NetworkPolicy
    name: argocd-redis-ha-proxy-network-policy

- patch: |-
    - op: replace
      path: /spec/egress/1/ports/0/port
      value: 5353
    - op: replace
      path: /spec/egress/1/ports/1/port
      value: 5353
  target:
    kind: NetworkPolicy
    name: argocd-redis-ha-server-network-policy

…goproj#11388) (argoproj#11862) * fix(redis): explicit bind to redis and sentinel for IPv4 clusters argoproj#11388 Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * fix(redis): run manifests generate Signed-off-by: rumstead <rjumstead@gmail.com> * Retrigger CI pipeline Signed-off-by: rumstead <rjumstead@gmail.com> Signed-off-by: rumstead <rjumstead@gmail.com> Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> Signed-off-by: schakrad <chakradari.sindhu@gmail.com>

sc0ttes · 2023-04-06T15:53:22Z

2.6.7 with OKD 4.12.0 (k8s 1.25.0) doesn't seem to work for me either (using this manifest). Similar to @kilian-hu-freiheit, the redis-ha statefulset and deployment pods never spin up. Appears to be a securityContext issue to me but having tried changing a lot of the variables around the securityContext (and granting 'anyuid' to the project) it still doesn't seem to want to boot the redis servers/proxy up.

Using 2.4.x works luckily.

yasargil · 2023-04-26T09:11:17Z

This fixed the problem for us for upgrading 2.4 -> 2.6

Seems that network policies argocd-redis-ha-proxy-network-policy and argocd-redis-ha-server-network-policy has to be reviewed. After deleting both policies everything started to work.
I have checked no other network policy has defined ports for DNS and only the above two have port 53 defined which is incorrect (for Openshift). Changed UPD/TCP ports to 5353 and everything came back to life.

Nice find @rimasgo! I verified this works for our deployment as well via kustomize changes against v2.6.2.
- patch: |-
    - op: replace
      path: /spec/egress/1/ports/0/port
      value: 5353
    - op: replace
      path: /spec/egress/1/ports/1/port
      value: 5353
  target:
    kind: NetworkPolicy
    name: argocd-redis-ha-proxy-network-policy

- patch: |-
    - op: replace
      path: /spec/egress/1/ports/0/port
      value: 5353
    - op: replace
      path: /spec/egress/1/ports/1/port
      value: 5353
  target:
    kind: NetworkPolicy
    name: argocd-redis-ha-server-network-policy

cehoffman · 2023-06-22T23:23:50Z

Stopping by to add where my issue with this symptom came from.

It had to do with the Kubernetes networking setup and the assumption with the HA redis setup of IPv4 networking. My cluster was configured in dual stack mode for IPv4 and IPv6. The IPv6 address range was the first in cluster specification, so it is the IP listed in places that don't show all IPs. Effectively if a Service definition does specify the IP family, it will be single family and IPv6. This is a problem for the HA setup because it defaults to all IPv4 bind addresses in the templated configuration files. Switching them all to IPv6, e.g. bind :: for redis and bind [::]:8888, bind [::]:6379 in HAproxy resolved the issue.

I suspect also changing the ipFamily in the service definitions to IPv4 would also work.

pre · 2023-08-03T15:01:15Z

Both argocd-redis-ha-server and argocd-redis-ha-haproxy were unable to start in ArgoCD 2.7.10. We were updating from 2.3.12 -> 2.7.10.

Services started after removing the NetworkPolicies argocd-redis-ha-server-network-policy and argocd-redis-ha-proxy-network-policy. I did not inspect yet further why the NetworkPolicy causes the failure, but there's something wrong with it.

redis-ha-server config-init container:

Thu Aug  3 14:51:42 UTC 2023 Start...
Initializing config..
Copying default redis config..
  to '/data/conf/redis.conf'
Copying default sentinel config..
  to '/data/conf/sentinel.conf'
Identifying redis master (get-master-addr-by-name)..
  using sentinel (argocd-redis-ha), sentinel group name (argocd)
Could not connect to Redis at argocd-redis-ha:26379: Try again
Could not connect to Redis at argocd-redis-ha:26379: Try again
Could not connect to Redis at argocd-redis-ha:26379: Try again
  Thu Aug  3 14:52:57 UTC 2023 Did not find redis master ()
Identify announce ip for this pod..
  using (argocd-redis-ha-announce-0) or (argocd-redis-ha-server-0)
  identified announce ()
/readonly-config/init.sh: line 239: Error: Could not resolve the announce ip for this pod.: not found

haproxy config-init container:

Waiting for service argocd-redis-ha-announce-0 to be ready (1) ...
Waiting for service argocd-redis-ha-announce-0 to be ready (2) ...
Waiting for service argocd-redis-ha-announce-0 to be ready (3) ...
Waiting for service argocd-redis-ha-announce-0 to be ready (4) ...
Waiting for service argocd-redis-ha-announce-0 to be ready (5) ...
Waiting for service argocd-redis-ha-announce-0 to be ready (6) ...
Waiting for service argocd-redis-ha-announce-0 to be ready (7) ...
Waiting for service argocd-redis-ha-announce-0 to be ready (8) ...
Waiting for service argocd-redis-ha-announce-0 to be ready (9) ...
Waiting for service argocd-redis-ha-announce-0 to be ready (10) ...
Could not resolve the announce ip for argocd-redis-ha-announce-0

dmpe · 2023-08-14T14:13:47Z

There are indeed 2 issues:

one is network policy as found out by @rimasgo @seanmmills
another one are SCC at least in the context of openshift:

(this is potentially insecure - but works...). With this ha redis pods are running "fine".

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  labels:
    app.kubernetes.io/component: redis
    app.kubernetes.io/name: argocd-role-ha-haproxy
    app.kubernetes.io/part-of: argocd
  name: argocd-role-ha-haproxy
  namespace: argocd
rules:
- apiGroups:
  - security.openshift.io
  resourceNames:
  - privileged
  resources:
  - securitycontextconstraints
  verbs:
  - use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: argocd-role-crb
  namespace: argocd
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: argocd-role-ha-haproxy
subjects:
- kind: ServiceAccount
  name: argocd-redis-ha-haproxy
  namespace: argocd
- kind: ServiceAccount
  name: argocd-redis-ha
  namespace: argocd

adjain131995 · 2023-09-26T02:58:16Z

This is certainly a big issue, I am running argocd on EKS 1.24. In my argocd module network policies do not exist so I have nothing to delete as well as my cluster is purely ipv4 so there is not solution there as well.
I am running v2.7.6 and the only thing that changed in Kubernetes 1.23 to 1.24.
Previously it was working fine

julian-waibel · 2023-12-07T19:22:17Z

Here is how I solved my version of this issue.
Edit: Maybe this comment is only relevant for the Helm chart version of Argo CD. However I leave this comment here in hope that it might be useful to somebody.

Issue

When using the argo-cd Helm chart version 5.51.6 (= Argo CD 2.9.3) from https://argoproj.github.io/argo-helm with enabled high availability version through values.yaml:

redis-ha:
  enabled: true

the argocd-redis-ha-haproxy-... pods crash and throw the following errors:

[ALERT]    (1) : Binding [/usr/local/etc/haproxy/haproxy.cfg:9] for proxy health_check_http_url: cannot create receiving socket (Address family not supported by protocol) for [:::8888]
[ALERT]    (1) : Binding [/usr/local/etc/haproxy/haproxy.cfg:56] for frontend ft_redis_master: cannot create receiving socket (Address family not supported by protocol) for [:::6379]
[ALERT]    (1) : Binding [/usr/local/etc/haproxy/haproxy.cfg:77] for frontend stats: cannot create receiving socket (Address family not supported by protocol) for [:::9101]
[ALERT]    (1) : [haproxy.main()] Some protocols failed to start their listeners! Exiting.

Cause and solution

I am running a Rancher RKE2 on-premise cluster which has IPv4/IPv6 dual-stack networking enabled. However it looks like IPv6 was not correctly enabled or is not correctly configured for the cluster. The argo-cd Helm chart uses redis-ha subchart (see https://github.com/argoproj/argo-helm/blob/c3c588038daa7c550bbd977c1298a1fd3f42d7c8/charts/argo-cd/Chart.yaml#L20-L23) which itself uses HAProxy configured to bind and consume IPv6 addresses by default, see https://github.com/DandyDeveloper/charts/blob/e12198606457c7281cd60bd1ed41bd8b0a34cd53/charts/redis-ha/values.yaml#L201C13-L203

In my case it worked to disable this setting by supplying the following values.yaml to the argo-cd Helm chart:

redis-ha:
  enabled: true
+ haproxy:
+   IPv6:
+     enabled: false

saintmalik · 2024-04-12T01:40:46Z

This is certainly a big issue, I am running argocd on EKS 1.24. In my argocd module network policies do not exist so I have nothing to delete as well as my cluster is purely ipv4 so there is not solution there as well. I am running v2.7.6 and the only thing that changed in Kubernetes 1.23 to 1.24. Previously it was working fine

you found a solution? having same issues

mjnovice · 2024-04-12T15:55:30Z

We see this as well with 2.7.7

1ocate · 2024-06-03T03:01:12Z

Here is how I solved my version of this issue. Edit: Maybe this comment is only relevant for the Helm chart version of Argo CD. However I leave this comment here in hope that it might be useful to somebody.

Issue

When using the argo-cd Helm chart version 5.51.6 (= Argo CD 2.9.3) from https://argoproj.github.io/argo-helm with enabled high availability version through values.yaml:
redis-ha:
  enabled: true
the argocd-redis-ha-haproxy-... pods crash and throw the following errors:
[ALERT]    (1) : Binding [/usr/local/etc/haproxy/haproxy.cfg:9] for proxy health_check_http_url: cannot create receiving socket (Address family not supported by protocol) for [:::8888]
[ALERT]    (1) : Binding [/usr/local/etc/haproxy/haproxy.cfg:56] for frontend ft_redis_master: cannot create receiving socket (Address family not supported by protocol) for [:::6379]
[ALERT]    (1) : Binding [/usr/local/etc/haproxy/haproxy.cfg:77] for frontend stats: cannot create receiving socket (Address family not supported by protocol) for [:::9101]
[ALERT]    (1) : [haproxy.main()] Some protocols failed to start their listeners! Exiting.
Cause and solution

I am running a Rancher RKE2 on-premise cluster which has IPv4/IPv6 dual-stack networking enabled. However it looks like IPv6 was not correctly enabled or is not correctly configured for the cluster. The argo-cd Helm chart uses redis-ha subchart (see https://github.com/argoproj/argo-helm/blob/c3c588038daa7c550bbd977c1298a1fd3f42d7c8/charts/argo-cd/Chart.yaml#L20-L23) which itself uses HAProxy configured to bind and consume IPv6 addresses by default, see https://github.com/DandyDeveloper/charts/blob/e12198606457c7281cd60bd1ed41bd8b0a34cd53/charts/redis-ha/values.yaml#L201C13-L203

In my case it worked to disable this setting by supplying the following values.yaml to the argo-cd Helm chart:
redis-ha:
  enabled: true
+ haproxy:
+   IPv6:
+     enabled: false

It works for me.
Thank you.

Casper-dss · 2024-07-17T09:30:03Z

We are still having issues with HA setup. We are using v2.10.12+cb6f5ac. If we close one zone, and try to zync in ArgoCD, it is stuck in "waiting to start". No errors in any logs are reported. This is a major issue, because we cannot do anything in our production environment without ArgoCD, because we are running on a hosted Kubernetes, and our only "admin" access is ArgoCD.

ML-std · 2024-07-18T05:25:48Z

In our case, we had to restart CoreDNS and Cilium agents; after that, the HA worked properly. I hope this helps someone

pre · 2024-09-27T15:11:04Z

Possibly related: Without maxconn 4096 haproxy eats up all available memory and gets OOM Killed. Pod remains in crashloop.

HA install's argocd-redis-ha-haproxy pods have runaway memory consumption #15319 (comment)

Akinorev added the bug Something isn't working label Nov 21, 2022

rumstead added a commit to rumstead/argo-cd that referenced this issue Dec 30, 2022

fix(redis): explicit bind to redis and sentinel for IPv4 clusters arg…

18e006c

…oproj#11388 Signed-off-by: rumstead <rjumstead@gmail.com>

rumstead mentioned this issue Dec 30, 2022

fix(redis): explicit bind to redis and sentinel for IPv4 clusters (#11388) #11862

Merged

10 tasks

rumstead added a commit to rumstead/argo-cd that referenced this issue Jan 4, 2023

fix(redis): explicit bind to redis and sentinel for IPv4 clusters arg…

e6c1cb1

…oproj#11388 Signed-off-by: rumstead <rjumstead@gmail.com>

rumstead added a commit to rumstead/argo-cd that referenced this issue Jan 4, 2023

fix(redis): explicit bind to redis and sentinel for IPv4 clusters arg…

a551e6c

…oproj#11388 Signed-off-by: rumstead <rjumstead@gmail.com>

crenshaw-dev closed this as completed in #11862 Jan 10, 2023

crenshaw-dev reopened this Jan 21, 2023

rumstead mentioned this issue Dec 18, 2023

Redis/sentinel not working with IPv4 cluster #5957

Closed

3 tasks

oscrx mentioned this issue Feb 25, 2024

redis-ha haproxy network policy not created argoproj/argo-helm#2526

Closed

alexmt added component:argo-cd type:bug labels Jul 17, 2024

MarcusDunn mentioned this issue Sep 24, 2024

[feature request] HA Deployments in ipv6 only clusters #20084

Open

Unable to deploy ArgoCD with HA #11388

Unable to deploy ArgoCD with HA #11388

Comments

Akinorev commented Nov 21, 2022 • edited Loading

makeittotop commented Nov 22, 2022 • edited Loading

acartag7 commented Nov 23, 2022

ghost commented Nov 23, 2022

34fathombelow commented Nov 23, 2022

otherguy commented Nov 24, 2022 • edited Loading

acartag7 commented Nov 24, 2022

34fathombelow commented Nov 24, 2022

Glutamat42 commented Nov 26, 2022 • edited Loading

otherguy commented Nov 30, 2022

Glutamat42 commented Dec 1, 2022

acartag7 commented Dec 1, 2022

kelly-brown commented Dec 19, 2022

rumstead commented Dec 29, 2022 • edited Loading

FrittenToni commented Jan 17, 2023 • edited Loading

seanmmills commented Jan 21, 2023

jas01 commented Feb 6, 2023

otherguy commented Feb 6, 2023

johnoct-au commented Feb 13, 2023 • edited Loading

otherguy commented Feb 22, 2023

jas01 commented Feb 22, 2023

johnoct-au commented Feb 22, 2023

rimasgo commented Feb 23, 2023

rimasgo commented Feb 23, 2023 • edited Loading

seanmmills commented Feb 23, 2023

sc0ttes commented Apr 6, 2023 • edited Loading

yasargil commented Apr 26, 2023

cehoffman commented Jun 22, 2023 • edited Loading

pre commented Aug 3, 2023 • edited Loading

dmpe commented Aug 14, 2023

adjain131995 commented Sep 26, 2023

julian-waibel commented Dec 7, 2023 • edited Loading

Issue

Cause and solution

saintmalik commented Apr 12, 2024

mjnovice commented Apr 12, 2024

1ocate commented Jun 3, 2024

Issue

Cause and solution

Casper-dss commented Jul 17, 2024

ML-std commented Jul 18, 2024

pre commented Sep 27, 2024

Akinorev commented Nov 21, 2022 •

edited

Loading

makeittotop commented Nov 22, 2022 •

edited

Loading

otherguy commented Nov 24, 2022 •

edited

Loading

Glutamat42 commented Nov 26, 2022 •

edited

Loading

rumstead commented Dec 29, 2022 •

edited

Loading

FrittenToni commented Jan 17, 2023 •

edited

Loading

johnoct-au commented Feb 13, 2023 •

edited

Loading

rimasgo commented Feb 23, 2023 •

edited

Loading

sc0ttes commented Apr 6, 2023 •

edited

Loading

cehoffman commented Jun 22, 2023 •

edited

Loading

pre commented Aug 3, 2023 •

edited

Loading

julian-waibel commented Dec 7, 2023 •

edited

Loading