Failed to get the sockets from the old process #696

frogluo · 2020-11-11T02:31:25Z

Description of the problem

My haproxy-ingress had run a long time, but this day
haproxy-ingress logs deploy on k8s showed:

W1111 02:08:41.422283       6 instance.go:457] output from haproxy:
[WARNING] 315/020841 (78) : We didn't get the expected number of sockets (expecting 253 got 0)
[NOTICE] 315/020841 (78) : haproxy version is 2.2.4-de45672
[ALERT] 315/020841 (78) : Failed to get the sockets from the old process!
E1111 02:08:41.422330       6 instance.go:301] error reloading server:
exit status 1

and then , the service ( update k8s deployment image only )on k8s can't access and return http status code:502
it's nomal for no update service(deployment/pod).

but if --reload-strategy=native，no problem。
so i think maybe something wrong with reusesocket?

Environment information

k8s 1.15.7
HAProxy Ingress version: v0.12.1
v0.9 has the problem so i upgrade to v0.12.1, and As well

it's R&D environment ，pod change very frequently。

Command-line options:

 - name: haproxy-ingress
        image: xxx/k8s/haproxy-ingress:v0.12.1
        args:
        - --default-backend-service=kube-system/ingress-default-backend
        - --configmap=kube-system/haproxy-ingress
        - --sort-backends
        - --rate-limit-update=0.05
        - --tcp-services-configmap=kube-system/tcp-services

Global options:

apiVersion: v1
kind: ConfigMap
metadata:
  name: haproxy-ingress
  namespace: kube-system
data:
  forwardfor: "ifmissing"
  max-connections: "65535"
  timeout-stop: "1800s"
  timeout-client: "1800s"
  timeout-server: "1800s"
  # nbthread: "4"
  use-haproxy-user: "true"
  slots-min-free: "1"

The text was updated successfully, but these errors were encountered:

jcmoraisjr · 2020-11-11T12:19:19Z

Hi, thanks for the detailed description. This is related with reading and reusing the listening sockets to perform a seamless reload, see -x command-line option here.

A few questions below:

The old process is still running and the unix socket is working after the reload failure? - eg echo "show info" | socat - /var/run/haproxy/admin.sock properly show haproxy info
This happens on every restart or only after running and successfully restarting for a while?
Any change if use-haproxy-user is "false"?

frogluo · 2020-11-24T08:29:44Z

Hi, thanks for the detailed description. This is related with reading and reusing the listening sockets to perform a seamless reload, see -x command-line option here.

A few questions below:

The old process is still running and the unix socket is working after the reload failure? - eg echo "show info" | socat - /var/run/haproxy/admin.sock properly show haproxy info

This happens on every restart or only after running and successfully restarting for a while?

Any change if use-haproxy-user is "false"?

Sorry to reply so late. It hasn't appeared since last time, so I didn't check as you saiy.

But today there is another problem。in one node，the haproxy-ingress log show(v0.12.1):

E1123 11:43:37.828224       6 instance.go:188] error reading admin socket: error connecting to unix socket /var/run/haproxy/admin.sock: dial unix /var/run/haproxy/admin.sock: connect: connection refused
E1123 11:43:38.328723       6 instance.go:188] error reading admin socket: error connecting to unix socket /var/run/haproxy/admin.sock: dial unix /var/run/haproxy/admin.sock: connect: connection refused
E1123 11:43:38.829045       6 instance.go:188] error reading admin socket: error connecting to unix socket /var/run/haproxy/admin.sock: dial unix /var/run/haproxy/admin.sock: connect: connection refused

The reason is a port conflict。

calico-node used a port in my setting scope, then i set service tcp port as the same，it happens~

==============
Maybe last time is the same reason: port conflict, and cause reload failed.

jcmoraisjr · 2020-11-30T01:22:32Z

Hi, both errors seem to be caused by distinct reasons - while the later is caused by haproxy failing to start, the former apparently happened on a running haproxy instance. I'll leave this issue open, maybe I have a clue as to what's going on, maybe you can reproduce it again and can make the proposed tests. Please update here if you have any news.

frogluo · 2020-12-09T06:43:33Z

Now I can confirm is that port conflict will lead to reload failure, such as:

error reading admin socket: error reading response buffer: read unix @-\u003e/var/run/haproxy/admin.sock.1104.tmp: read: connection reset by peer
Starting proxy _tcp_476944_kme-web_38508: cannot bind socket [0.0.0.0:38508]

E1123 11:43:37.828224       6 instance.go:188] error reading admin socket: error connecting to unix socket /var/run/haproxy/admin.sock: dial unix /var/run/haproxy/admin.sock: connect: connection refused
E1123 11:43:38.328723       6 instance.go:188] error reading admin socket: error connecting to unix socket /var/run/haproxy/admin.sock: dial unix /var/run/haproxy/admin.sock: connect: connection refused
E1123 11:43:38.829045       6 instance.go:188] error reading admin socket: error connecting to unix socket /var/run/haproxy/admin.sock: dial unix /var/run/haproxy/admin.sock: connect: connection refused

bug it repeated the error twice, not because of port conflict . I find that , i used keepalived to set a vip on the node, and according to this website:
TURNING ON PACKET FORWARDING AND NONLOCAL BINDING

setting: net.ipv4.ip_nonlocal_bind=1

Now I'll continue to observe to see if this problem will occur again

anzersy · 2020-12-29T02:31:56Z

@jcmoraisjr Regarding to the issue, could you please take a look into the reloading script haproxy-reload.sh. It means that "reloading" function would not have a chance to run once socat commond retruns non-zero.

jcmoraisjr · 2021-01-09T19:36:20Z

@anzersy thanks for the hint, #719 should fix this.

frogluo · 2021-03-01T10:13:54Z

@anzersy thanks for the hint, #719 should fix this.

I don't think you fix this problom。

I modify script haproxy-reload.sh，and uncommamd this line:
#set -e

it temporarily solve this problem. when haproxy reload and fail to get the socket，the docker container could exist and then restart by k8s.

alienth · 2022-01-19T15:41:56Z

@jcmoraisjr We're encountering the reload failure on 0.12.11. It coincided with other potentially interesting logs:

Jan 19 06:15:47 haproxy-ingress haproxy-ingress warn W0119 15:15:47.057766       7 dynupdate.go:346] unrecognized response adding/updating endpoint internal_http/srv603: No such server.
Jan 19 06:15:48 haproxy-ingress haproxy-ingress warn W0119 15:15:48.140094       7 instance.go:469] output from haproxy:
Jan 19 06:15:48 haproxy-ingress haproxy-ingress WARNING [WARNING] 018/151547 (187) : Failed to get the number of sockets to be transferred !
Jan 19 06:15:48 haproxy-ingress haproxy-ingress NOTICE [NOTICE] 018/151547 (187) : haproxy version is 2.2.19-7ea3822
Jan 19 06:15:48 haproxy-ingress haproxy-ingress ALERT [ALERT] 018/151547 (187) : Failed to get the sockets from the old process!
Jan 19 06:15:48 haproxy-ingress haproxy-ingress error E0119 15:15:48.140123       7 instance.go:303] error reloading server:

I'll poke around to see if I can find a proximate cause. If I do see one I'll be happy to contribute a PR.

alienth · 2022-01-19T20:30:36Z

There is some conjecture over here that this sort of symptom may be related to the alpine image of haproxy: haproxy/haproxy#1413

I'm gonna give non-alpine a shot to see what happens.

alienth · 2022-01-19T21:18:01Z

Very recent, potentially relevant commit: haproxy/haproxy@148d7a0

Fixed in 2.2.20: https://www.haproxy.org/download/2.2/src/CHANGELOG

jcmoraisjr · 2022-01-19T22:11:56Z

Hooray thanks for sharing =) we've a small amount of commits in the queue and will tag new versions shortly.

jcmoraisjr · 2022-01-22T20:09:45Z

v0.12.12 was just released with embedded haproxy version 2.2.20, which fixes this issue on libmusl, btw used on our base image. @frogluo I hope this also fixes the issue you've reported. Thank you @alienth for linking the points!

I've just removed the backlog tag, feel free to provide any update, otherwise this issue will be closed in a couple of weeks.

github-actions · 2022-02-28T10:21:10Z

This issue got stale and will be closed in 7 days.

huangjiasingle · 2022-10-13T10:28:18Z

v0.13.10 has the same problem, the primary cause by port conflict.

frogluo added kind/bug status/needs-triage labels Nov 11, 2020

jcmoraisjr added lifecycle/backlog and removed status/needs-triage labels Nov 30, 2020

lenhard mentioned this issue Dec 3, 2021

haproxy failed to reload: Failed to connect to the old process socket #869

Open

jcmoraisjr removed the lifecycle/backlog label Jan 22, 2022

jcmoraisjr mentioned this issue Feb 12, 2022

Reload seems to sporadically reset running connections #899

Closed

github-actions bot added the lifecycle/stale label Feb 28, 2022

github-actions bot closed this as completed Mar 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to get the sockets from the old process #696

Failed to get the sockets from the old process #696

frogluo commented Nov 11, 2020

jcmoraisjr commented Nov 11, 2020

frogluo commented Nov 24, 2020

jcmoraisjr commented Nov 30, 2020

frogluo commented Dec 9, 2020

anzersy commented Dec 29, 2020

jcmoraisjr commented Jan 9, 2021

frogluo commented Mar 1, 2021

alienth commented Jan 19, 2022

alienth commented Jan 19, 2022

alienth commented Jan 19, 2022 •

edited

Loading

jcmoraisjr commented Jan 19, 2022

jcmoraisjr commented Jan 22, 2022

github-actions bot commented Feb 28, 2022

huangjiasingle commented Oct 13, 2022

Failed to get the sockets from the old process #696

Failed to get the sockets from the old process #696

Comments

frogluo commented Nov 11, 2020

jcmoraisjr commented Nov 11, 2020

frogluo commented Nov 24, 2020

jcmoraisjr commented Nov 30, 2020

frogluo commented Dec 9, 2020

anzersy commented Dec 29, 2020

jcmoraisjr commented Jan 9, 2021

frogluo commented Mar 1, 2021

alienth commented Jan 19, 2022

alienth commented Jan 19, 2022

alienth commented Jan 19, 2022 • edited Loading

jcmoraisjr commented Jan 19, 2022

jcmoraisjr commented Jan 22, 2022

github-actions bot commented Feb 28, 2022

huangjiasingle commented Oct 13, 2022

alienth commented Jan 19, 2022 •

edited

Loading