Multiple reloads of haproxy config without any apparent changes #745

mhyllander · 2021-03-01T17:11:20Z

After recently migrating from haproxy-ingress 0.7.6 to 0.12.1 (!), I saw a change in behavior when rolling out a new version of a service deployment. haproxy reloaded it's configuration multiple times during the rollout, apparently without any changes to the configuration.

We have a service with very long-lived websocket connections. Therefore we have configured haproxy-ingress to avoid restarting and dropping those connections as much as possible. We are using reusesockets, dynamic scaling and DNS resolvers. We also set timeout-stop to 24h so that existing connections are kept alive, so that most clients have time to re-connect during the next 24h. (If we didn't do this we would get a burst of clients re-connecting at the same time.)

What happened is that our service with 30 pods was rolled out with 6 pods at a time. At each rollout step, haproxy forked another instance, so the haproxy-ingress pod memory usage grew in steps, and of course the count of current connections restarted from zero each time.

Note also that I am running an "external" haproxy in a side-car.

Expected behavior

Since backend server discovery is done through DNS lookups, haproxy.cfg does not change, so there should not be a need to restart haproxy. (Even if we didn't use DNS lookups, dynamic scaling should have allowed updates without restarting haproxy.) This is the way it used to work in 0.7.6.

Steps to reproduce the problem

Environment information

HAProxy Ingress version: v0.12.1

Command-line options:

  - args:
    - --configmap=core/re-ingress-haproxy
    - --ingress-class=re
    - --master-socket=/var/run/haproxy/master.sock
    - --sort-backends
    - --default-backend-service=core/re-ingress-haproxy-default-backend
    - --default-ssl-certificate=core/re-ingress-cert
    - --reload-strategy=reusesocket
    - --watch-ingress-without-class=true

Global options:

data:
  backend-server-slots-increment: "100"
  config-frontend: |
    #-------------------------------------------------------------
    capture request header Host len 40
    capture request header User-Agent len 100
    capture request header X-Forwarded-For len 100
    #-------------------------------------------------------------
  dns-hold-obsolete: 30s
  dns-hold-valid: 30s
  dns-resolvers: kubernetes=10.128.0.10
  dynamic-scaling: "true"
  forwardfor: update
  healthz-port: "10253"
  http-log-format: '%ci:%cp\ [%t]\ %ft\ %b/%s\ %sslv\ %sslc\ %Tq/%Tw/%Tc/%Tr/%Tt\
    %ST\ %B\ %U\ %CC\ %CS\ %tsc\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %hr\ %hs\ %{+Q}r'
  max-connections: "500000"
  prometheus-port: "9101"
  ssl-ciphers: ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
  ssl-dh-param: core/re-ingress-secrets
  ssl-options: no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
  stats-port: "1936"
  syslog-endpoint: stdout
  syslog-format: raw
  tcp-log-format: '%ci:%cp\ [%t]\ %ft\ %b/%s\ %sslv\ %sslc\ %Tw/%Tc/%Tt\ %B\ %U\ %ts\
    %ac/%fc/%bc/%sc/%rc\ %sq/%bq'
  timeout-client: 2m
  timeout-stop: 24h

Ingress objects:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    ingress.kubernetes.io/use-resolver: kubernetes
    kubernetes.io/ingress.class: re
    meta.helm.sh/release-name: re-worker-service
    meta.helm.sh/release-namespace: platform
  labels:
    app: re-worker-service
    app.kubernetes.io/managed-by: Helm
    appType: finagle
    chart: re-worker-service-0.1.0
    heritage: Helm
    release: re-worker-service
  name: re-worker-service
  namespace: platform
spec:
  rules:
  - host: api-groot-prod.hivestreaming.com
    http:
      paths:
      - backend:
          serviceName: re-worker-service
          servicePort: http
        path: /v1/selftest
        pathType: Prefix
      - backend:
          serviceName: re-worker-service
          servicePort: admin
        path: /admin/ping
        pathType: Prefix
  - host: peers.hivestreaming.com
    http:
      paths:
      - backend:
          serviceName: re-worker-service
          servicePort: http
        path: /v1/selftest
        pathType: Prefix
      - backend:
          serviceName: re-worker-service
          servicePort: admin
        path: /admin/ping
        pathType: Prefix
  tls:
  - hosts:
    - api-groot-prod.hivestreaming.com
    - peers.hivestreaming.com
    secretName: re-worker-service-cert
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    ingress.kubernetes.io/agent-check-port: "9991"
    ingress.kubernetes.io/initial-weight: "100"
    ingress.kubernetes.io/timeout-tunnel: 60s
    ingress.kubernetes.io/use-resolver: kubernetes
    kubernetes.io/ingress.class: re
    meta.helm.sh/release-name: re-worker-service
    meta.helm.sh/release-namespace: platform
  labels:
    app: re-worker-service
    app.kubernetes.io/managed-by: Helm
    appType: finagle
    chart: re-worker-service-0.1.0
    heritage: Helm
    release: re-worker-service
  name: re-worker-service-websocket
  namespace: platform
spec:
  rules:
  - host: api-groot-prod.hivestreaming.com
    http:
      paths:
      - backend:
          serviceName: re-worker-service-websocket
          servicePort: websocket
        path: /v1/peer
        pathType: Prefix
  - host: peers.hivestreaming.com
    http:
      paths:
      - backend:
          serviceName: re-worker-service-websocket
          servicePort: websocket
        path: /v1/peer
        pathType: Prefix
  tls:
  - hosts:
    - api-groot-prod.hivestreaming.com
    - peers.hivestreaming.com
    secretName: re-worker-service-cert

This results in the following backend configurations:

backend platform_re-worker-service-websocket_8001
    mode http
    balance roundrobin
    timeout tunnel 60s
    acl https-request ssl_fc
    http-request redirect scheme https if !https-request
    option forwardfor
    http-response set-header Strict-Transport-Security "max-age=15768000"
    server-template srv 100 re-worker-service-websocket.platform.svc.cluster.local:8001 resolvers kubernetes resolve-prefer ipv4 init-addr none weight 100 check inter 2s agent-check agent-port 9991
backend platform_re-worker-service_admin
    mode http
    balance roundrobin
    acl https-request ssl_fc
    http-request redirect scheme https if !https-request
    option forwardfor
    http-response set-header Strict-Transport-Security "max-age=15768000"
    server-template srv 100 _admin._tcp.re-worker-service.platform.svc.cluster.local resolvers kubernetes resolve-prefer ipv4 init-addr none weight 1 check inter 2s
backend platform_re-worker-service_http
    mode http
    balance roundrobin
    acl https-request ssl_fc
    http-request redirect scheme https if !https-request
    option forwardfor
    http-response set-header Strict-Transport-Security "max-age=15768000"
    server-template srv 100 _http._tcp.re-worker-service.platform.svc.cluster.local resolvers kubernetes resolve-prefer ipv4 init-addr none weight 1 check inter 2s

In an haproxy container, it looked like this after the rollout:

$ kprod -n core exec re-ingress-haproxy-2zrcx -c haproxy -it -- sh
/ # ps -f
PID   USER     TIME  COMMAND
    1 root      0:01 haproxy -sf 120 117 103 100 97 94 91 88 37 34 31 28 25 22 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
    7 root      6:44 haproxy -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
   10 root      0:42 haproxy -sf 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
   13 root      0:37 haproxy -sf 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
   16 root      0:35 haproxy -sf 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
   19 root      0:34 haproxy -sf 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
   22 root      0:34 haproxy -sf 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
   25 root      0:33 haproxy -sf 22 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
   28 root      0:33 haproxy -sf 25 22 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
   31 root      0:35 haproxy -sf 28 25 22 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
   34 root      0:29 haproxy -sf 31 28 25 22 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
   37 root      6:53 haproxy -sf 34 31 28 25 22 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
   88 root      0:19 haproxy -sf 37 34 31 28 25 22 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
   91 root      0:17 haproxy -sf 88 37 34 31 28 25 22 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
   94 root      2:31 haproxy -sf 91 88 37 34 31 28 25 22 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
   97 root      0:11 haproxy -sf 94 91 88 37 34 31 28 25 22 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
  100 root      0:11 haproxy -sf 97 94 91 88 37 34 31 28 25 22 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
  103 root      2:10 haproxy -sf 100 97 94 91 88 37 34 31 28 25 22 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
  117 root      0:04 haproxy -sf 103 100 97 94 91 88 37 34 31 28 25 22 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
  120 root      0:05 haproxy -sf 117 103 100 97 94 91 88 37 34 31 28 25 22 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy
  123 root      1:13 haproxy -sf 120 117 103 100 97 94 91 88 37 34 31 28 25 22 19 16 13 10 7 -x /var/run/haproxy/admin.sock -W -db -W -S /var/run/haproxy/master.sock,mode,600 -f /etc/haproxy

Here's a snapshot of the number of connections during the rollout (there were some problems with the service initially). You can see how haproxy forks a new instance in all haproxy-ingress pods with each new step of the deployment rollout. (But note also that there seem to be many more haproxy instances running in the example above than just four or five.):

The text was updated successfully, but these errors were encountered:

jcmoraisjr · 2021-03-01T17:16:42Z

Please add the --v=2 command-line option - verbosity is very low - and paste the messages logged during such reloads. This will give us an idea of why haproxy ingress is forking a new proxy instance.

mhyllander · 2021-03-01T17:25:54Z

Thanks, I'll do that and get back to you with the results.

mhyllander · 2021-03-02T01:05:56Z

@jcmoraisjr I have sent you the log file to your gmail. As I mention in the mail, I have verified that the config does not change between haproxy restarts:

/ # cd /etc/haproxy/
/etc/haproxy # ls -l
total 28
-rw-r--r--    1 root     root         15267 Mar  2 00:37 haproxy.cfg
drwxr-xr-x    2 root     root          4096 Mar  2 00:12 lua
drwxr-xr-x    2 root     root          4096 Mar  2 00:12 maps
-rw-r--r--    1 root     root           677 Mar  2 00:37 spoe-modsecurity.conf
/etc/haproxy # cp -p haproxy.cfg /tmp/haproxy.cfg.0037
/etc/haproxy # ls -l
total 28
-rw-r--r--    1 root     root         15267 Mar  2 00:45 haproxy.cfg
drwxr-xr-x    2 root     root          4096 Mar  2 00:12 lua
drwxr-xr-x    2 root     root          4096 Mar  2 00:12 maps
-rw-r--r--    1 root     root           677 Mar  2 00:45 spoe-modsecurity.conf
/etc/haproxy # cp -p haproxy.cfg /tmp/haproxy.cfg.0045
/etc/haproxy # diff /tmp/haproxy.cfg.0037 /tmp/haproxy.cfg.0045
/etc/haproxy # ls -l maps
total 12
-rw-r--r--    1 root     root           585 Mar  2 00:12 _front_bind_crt.list
-rw-r--r--    1 root     root          2318 Mar  2 00:12 _front_http_host__prefix.map
-rw-r--r--    1 root     root          2318 Mar  2 00:12 _front_https_host__prefix.map

jcmoraisjr · 2021-03-02T13:48:52Z

I can reproduce this. DNS based updates are not properly updating its internal control of the server-template size. I've just pushed quay.io/jcmoraisjr/h:v0.12.2-r1 which probably fixes this (note this is a temporary and local built image).

Maybe you can also consider to give the Kubernetes' endpoint based update a chance, removing the use-resolver config - this strategy is (should be) properly updating the internal control of empty slots and should not reload when running a rolling update.

Out of curiosity - why use DNS based updates? Endpoints should have a faster update due to the controller watch in the k8s api, and haproxy is also updated without the need to reload.

mhyllander · 2021-03-02T16:01:55Z

Honestly I don't remember now if there was a specific reason why we chose DNS discovery. We had been using haproxy for a long time in a VM environment, and when we moved to kubernetes we initially tried nginx-ingress. But it was too unstable: it used varying amounts of memory, the dynamic updates were not reliable, and it was not able to reload it's config without dropping existing connections. So I set out to bring in haproxy-ingress and haproxy instead, and submitted several updates to the incubator/haproxy-ingress chart to bring it to par with the stable/nginx-ingress chart.

Also we found we needed the dynamic weight-based load balancing provided by haproxy's agent-check to be able to spread long-lived websocket connections evenly over a number of pods, without overloading any of them.

My main goal was to be able to do dynamic updates with as few restarts as possible, while not dropping existing connections. I had read about haproxy being able to do DNS lookups (A and SRV) to discover backend servers, and it sounded like a good alternative, so I guess that's why we went with that.

So endpoint based updates would also provide dynamic updates, without unnecessary config reloads? Are there any drawbacks?

jcmoraisjr · 2021-03-02T17:50:07Z

and submitted several updates to the incubator/haproxy-ingress chart to bring it to par with the stable/nginx-ingress chart

Great! You should also know but history was preserved

I had read about haproxy being able to do DNS lookups (A and SRV) to discover backend servers, and it sounded like a good alternative

It's true and the only doable alternative on static environments, but it's just another strategy when you have a discovery system like Kubernetes, a dynamically configurable proxy like HAProxy and its Runtime API, and a controller that can read from one and apply to the other in a fast and safe way. Looking to the big picture DNS sounds a bit ... old =)

So endpoint based updates would also provide dynamic updates, without unnecessary config reloads? Are there any drawbacks?

Yep, ep based has also dynamic updates and uses native support - no addons, no memory leak. I'm not aware of any drawback, but instead I'd say it's safer and faster - safer because we use it since ages on pretty large and noisy clusters with tens of rolling updates every day - I'm pretty close to the SRE team, taking care of logs and metrics, looking for misbehavior like unexpected reloads; and faster because it's almost instantaneous without the need to rely on DNS updates.

while not dropping existing connections

In the case you've not found worker-max-reloads yet, it can help you to increase timeout-stop without going to out of memory.

jcmoraisjr · 2021-03-27T19:40:32Z

v0.12.2, v0.11.5 and v0.10.6 were just released and fixes this issue. Closing.

mhyllander added kind/bug status/needs-triage labels Mar 1, 2021

jcmoraisjr added lifecycle/backlog and removed status/needs-triage labels Mar 2, 2021

jcmoraisjr mentioned this issue Mar 2, 2021

Fix incorrect reload if endpoint list grows #746

Merged

jcmoraisjr removed the lifecycle/backlog label Mar 2, 2021

jcmoraisjr added this to the v0.10 milestone Mar 8, 2021

jcmoraisjr modified the milestones: v0.10, v0.12 Mar 15, 2021

jcmoraisjr closed this as completed Mar 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple reloads of haproxy config without any apparent changes #745

Multiple reloads of haproxy config without any apparent changes #745

mhyllander commented Mar 1, 2021

jcmoraisjr commented Mar 1, 2021

mhyllander commented Mar 1, 2021

mhyllander commented Mar 2, 2021

jcmoraisjr commented Mar 2, 2021

mhyllander commented Mar 2, 2021

jcmoraisjr commented Mar 2, 2021

jcmoraisjr commented Mar 27, 2021

Multiple reloads of haproxy config without any apparent changes #745

Multiple reloads of haproxy config without any apparent changes #745

Comments

mhyllander commented Mar 1, 2021

jcmoraisjr commented Mar 1, 2021

mhyllander commented Mar 1, 2021

mhyllander commented Mar 2, 2021

jcmoraisjr commented Mar 2, 2021

mhyllander commented Mar 2, 2021

jcmoraisjr commented Mar 2, 2021

jcmoraisjr commented Mar 27, 2021