Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to get the sockets from the old process #696

Closed
frogluo opened this issue Nov 11, 2020 · 14 comments
Closed

Failed to get the sockets from the old process #696

frogluo opened this issue Nov 11, 2020 · 14 comments

Comments

@frogluo
Copy link

frogluo commented Nov 11, 2020

Description of the problem

My haproxy-ingress had run a long time, but this day
haproxy-ingress logs deploy on k8s showed:

W1111 02:08:41.422283       6 instance.go:457] output from haproxy:
[WARNING] 315/020841 (78) : We didn't get the expected number of sockets (expecting 253 got 0)
[NOTICE] 315/020841 (78) : haproxy version is 2.2.4-de45672
[ALERT] 315/020841 (78) : Failed to get the sockets from the old process!
E1111 02:08:41.422330       6 instance.go:301] error reloading server:
exit status 1

and then , the service ( update k8s deployment image only )on k8s can't access and return http status code:502
it's nomal for no update service(deployment/pod).

but if --reload-strategy=native,no problem。
so i think maybe something wrong with reusesocket?

Environment information

k8s 1.15.7
HAProxy Ingress version: v0.12.1
v0.9 has the problem so i upgrade to v0.12.1, and As well

it's R&D environment ,pod change very frequently。

Command-line options:

 - name: haproxy-ingress
        image: xxx/k8s/haproxy-ingress:v0.12.1
        args:
        - --default-backend-service=kube-system/ingress-default-backend
        - --configmap=kube-system/haproxy-ingress
        - --sort-backends
        - --rate-limit-update=0.05
        - --tcp-services-configmap=kube-system/tcp-services

Global options:

apiVersion: v1
kind: ConfigMap
metadata:
  name: haproxy-ingress
  namespace: kube-system
data:
  forwardfor: "ifmissing"
  max-connections: "65535"
  timeout-stop: "1800s"
  timeout-client: "1800s"
  timeout-server: "1800s"
  # nbthread: "4"
  use-haproxy-user: "true"
  slots-min-free: "1"
@jcmoraisjr
Copy link
Owner

Hi, thanks for the detailed description. This is related with reading and reusing the listening sockets to perform a seamless reload, see -x command-line option here.

A few questions below:

  • The old process is still running and the unix socket is working after the reload failure? - eg echo "show info" | socat - /var/run/haproxy/admin.sock properly show haproxy info
  • This happens on every restart or only after running and successfully restarting for a while?
  • Any change if use-haproxy-user is "false"?

@frogluo
Copy link
Author

frogluo commented Nov 24, 2020

Hi, thanks for the detailed description. This is related with reading and reusing the listening sockets to perform a seamless reload, see -x command-line option here.

A few questions below:

  • The old process is still running and the unix socket is working after the reload failure? - eg echo "show info" | socat - /var/run/haproxy/admin.sock properly show haproxy info
  • This happens on every restart or only after running and successfully restarting for a while?
  • Any change if use-haproxy-user is "false"?

Sorry to reply so late. It hasn't appeared since last time, so I didn't check as you saiy.

But today there is another problem。in one node,the haproxy-ingress log show(v0.12.1):

E1123 11:43:37.828224       6 instance.go:188] error reading admin socket: error connecting to unix socket /var/run/haproxy/admin.sock: dial unix /var/run/haproxy/admin.sock: connect: connection refused
E1123 11:43:38.328723       6 instance.go:188] error reading admin socket: error connecting to unix socket /var/run/haproxy/admin.sock: dial unix /var/run/haproxy/admin.sock: connect: connection refused
E1123 11:43:38.829045       6 instance.go:188] error reading admin socket: error connecting to unix socket /var/run/haproxy/admin.sock: dial unix /var/run/haproxy/admin.sock: connect: connection refused

The reason is a port conflict。
image

calico-node used a port in my setting scope, then i set service tcp port as the same,it happens~

==============
Maybe last time is the same reason: port conflict, and cause reload failed.

@jcmoraisjr
Copy link
Owner

Hi, both errors seem to be caused by distinct reasons - while the later is caused by haproxy failing to start, the former apparently happened on a running haproxy instance. I'll leave this issue open, maybe I have a clue as to what's going on, maybe you can reproduce it again and can make the proposed tests. Please update here if you have any news.

@frogluo
Copy link
Author

frogluo commented Dec 9, 2020

  1. Now I can confirm is that port conflict will lead to reload failure, such as:
error reading admin socket: error reading response buffer: read unix @-\u003e/var/run/haproxy/admin.sock.1104.tmp: read: connection reset by peer
Starting proxy _tcp_476944_kme-web_38508: cannot bind socket [0.0.0.0:38508]

E1123 11:43:37.828224       6 instance.go:188] error reading admin socket: error connecting to unix socket /var/run/haproxy/admin.sock: dial unix /var/run/haproxy/admin.sock: connect: connection refused
E1123 11:43:38.328723       6 instance.go:188] error reading admin socket: error connecting to unix socket /var/run/haproxy/admin.sock: dial unix /var/run/haproxy/admin.sock: connect: connection refused
E1123 11:43:38.829045       6 instance.go:188] error reading admin socket: error connecting to unix socket /var/run/haproxy/admin.sock: dial unix /var/run/haproxy/admin.sock: connect: connection refused

  1. bug it repeated the error twice, not because of port conflict . I find that , i used keepalived to set a vip on the node, and according to this website:
    TURNING ON PACKET FORWARDING AND NONLOCAL BINDING

setting: net.ipv4.ip_nonlocal_bind=1

Now I'll continue to observe to see if this problem will occur again

@anzersy
Copy link

anzersy commented Dec 29, 2020

@jcmoraisjr Regarding to the issue, could you please take a look into the reloading script haproxy-reload.sh. It means that "reloading" function would not have a chance to run once socat commond retruns non-zero.

@jcmoraisjr
Copy link
Owner

@anzersy thanks for the hint, #719 should fix this.

@frogluo
Copy link
Author

frogluo commented Mar 1, 2021

@anzersy thanks for the hint, #719 should fix this.

I don't think you fix this problom。

I modify script haproxy-reload.sh,and uncommamd this line:
#set -e

it temporarily solve this problem. when haproxy reload and fail to get the socket,the docker container could exist and then restart by k8s.

@alienth
Copy link

alienth commented Jan 19, 2022

@jcmoraisjr We're encountering the reload failure on 0.12.11. It coincided with other potentially interesting logs:

Jan 19 06:15:47 haproxy-ingress haproxy-ingress warn W0119 15:15:47.057766       7 dynupdate.go:346] unrecognized response adding/updating endpoint internal_http/srv603: No such server.
Jan 19 06:15:48 haproxy-ingress haproxy-ingress warn W0119 15:15:48.140094       7 instance.go:469] output from haproxy:
Jan 19 06:15:48 haproxy-ingress haproxy-ingress WARNING [WARNING] 018/151547 (187) : Failed to get the number of sockets to be transferred !
Jan 19 06:15:48 haproxy-ingress haproxy-ingress NOTICE [NOTICE] 018/151547 (187) : haproxy version is 2.2.19-7ea3822
Jan 19 06:15:48 haproxy-ingress haproxy-ingress ALERT [ALERT] 018/151547 (187) : Failed to get the sockets from the old process!
Jan 19 06:15:48 haproxy-ingress haproxy-ingress error E0119 15:15:48.140123       7 instance.go:303] error reloading server:

I'll poke around to see if I can find a proximate cause. If I do see one I'll be happy to contribute a PR.

@alienth
Copy link

alienth commented Jan 19, 2022

There is some conjecture over here that this sort of symptom may be related to the alpine image of haproxy: haproxy/haproxy#1413

I'm gonna give non-alpine a shot to see what happens.

@alienth
Copy link

alienth commented Jan 19, 2022

Very recent, potentially relevant commit: haproxy/haproxy@148d7a0

Fixed in 2.2.20: https://www.haproxy.org/download/2.2/src/CHANGELOG

@jcmoraisjr
Copy link
Owner

Hooray thanks for sharing =) we've a small amount of commits in the queue and will tag new versions shortly.

@jcmoraisjr
Copy link
Owner

v0.12.12 was just released with embedded haproxy version 2.2.20, which fixes this issue on libmusl, btw used on our base image. @frogluo I hope this also fixes the issue you've reported. Thank you @alienth for linking the points!

I've just removed the backlog tag, feel free to provide any update, otherwise this issue will be closed in a couple of weeks.

@github-actions
Copy link

This issue got stale and will be closed in 7 days.

@huangjiasingle
Copy link

v0.13.10 has the same problem, the primary cause by port conflict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants