Websocket connections not closed using Kong load balancer #83

landorg · 2019-11-08T13:02:26Z

Tell us about your environment

AnyCable-Go version:
v0.6.3
AnyCable gem version:
v0.6.3

What did you do?

Use kong as a load balancer.

What did you expect to happen?

Websocket connections are closed after some time.

What actually happened?

Websocket connections add up very fast until kong reaches its limit (8-10k) and restarts with

[alert] 33#0: 16384 worker_connections are not enough

Before kong we used the native kubernetes nginx as a load-balancer and the connections got closed properly. We usually had around 1k connections at peak times.
First we thougt the problem was that sticky sessions were not working properly. This problem should be fixed but we still see the same issue. (first bump in the picture is before we used sticky sessions hash_on: ip)

EDIT: a graph from before we were using kong:

Are there any special settings that need to be set in load-balancer?
Any other idea how to resolve that problem?

Thank You

The text was updated successfully, but these errors were encountered:

palkan · 2019-11-08T15:11:45Z

Don't know anything about Kong 🤷🏻‍♂️

we used sticky sessions hash_on: ip

Preferably, you should use "least open connections" for WebSockets; that would result in more uniform balancing.

But as I see here:

'least open connections' does not make sense in a Kong cluster

Anyway, that shouldn't be the reason for this issue.

As far as I understand, WebSockets load balancing works the following way:

Client connects to the LB.
LB connects to the upstream.
Client disconnects from the LB.
LB closes the corresponding connection to the upstream.

What could go wrong? I have two ideas:

Kong doesn't detect closed connections properly.
I can't say anything about Kong internal, but maybe that's something lying lower, at the OS (or whatever it is in k8s?) layer related to TCP settings (see, for example, https://docs.anycable.io/#/anycable-go/os_tuning?id=tcp-keepalive).
Kong doesn't close/reap LB->upstream connections.
Maybe, Kong expects a specific closing code or status or something (which we do not set in anycable-go) and doesn't remove the connection (although the client has left).

Do you know how to reproduce this setup locally? Maybe, via docker-compose configuration with Kong and AnyCable? Anything that we'll be able to use for investigation is appreciated.

landorg · 2019-11-11T09:45:51Z

Thanks for your answer.
I'll try to provide you a minimal example of our setup.
I also asked this questions to the guys from kong here: https://discuss.konghq.com/t/anycable-websocket-connections-not-closed/4866

regarding the tcp keepalive settings:
we use the default settings from ubuntu:

net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9

as far as I understand this means a websocket connection shoul be closed after 2 hours if all 9 request probes fail. So there should be a reduction of connections somewhen if this woud work. right? Here the graph from the weekend with less traffic:

I'll add a graph of normal operation to my original question

le0pard · 2020-03-11T17:35:22Z

We had something similar on project with such scheme:

AWS ALB <-> nginx <-> anycable-go

Issue was net.ipv4.tcp_tw_reuse, which had value 1 (mean enabled). It created issue, where our websocket connections not die, even with reducer keepalive for tcp. Websockets connections stop "leaking" after disable net.ipv4.tcp_tw_reuse (value 0).

Also, check that you don't have net.ipv4.tcp_tw_recycle with value 1. It also can cause issues (on new linux kernels net.ipv4.tcp_tw_recycle was removed) and need to be disabled.

Maybe it will help you @RolandG

landorg · 2020-04-07T09:58:47Z

Thanks for the tip. We didn't have the issue since we have activated again the orange cloud in cloudflare. We had it deactivated before because we had performance problems with it. Since then the connections are closed at a certain point from cloudflare I gues. Still we might need this in the future.

landorg · 2020-07-22T09:06:31Z

@le0pard
Currently doing some tests.
Did you set these parameters on the nginx or on the anycable server?

le0pard · 2020-07-22T09:09:59Z

@RolandG it is params for Linux kernel - https://wiki.archlinux.org/index.php/sysctl

Was globally available for all processes on the system

scalp42 · 2021-01-19T17:40:10Z

@le0pard at which layer you made the sysctl change? the Nginx or Anycable one?

le0pard · 2021-01-19T17:44:05Z

@scalp42 sysctl changes was made on system/OS level.

scalp42 · 2021-01-19T18:11:05Z

@le0pard right but on which host? the Nginx one or Anycable host?

le0pard · 2021-01-19T18:12:18Z

It was the same host in my case @scalp42

eneeyac · 2021-01-25T16:32:48Z

Hi, guys
I have a similar issue with my anycable environment.

The anycable-go server is behind the Nginx reverse proxy and according to ngx_http_stub_status_module (http://nginx.org/en/docs/http/ngx_http_stub_status_module.html) I see near 80 active connections that look close to reality for that period of time for my app.

But at the same time anycable_go_clients_uniq_num metric for anycable-go shows 360 clients and anycable_go_clients_num more than 2000 clients. After I restarted the anycable-go, the anycable_go_clients_num became the same as nginx metrics show.

And anycable_go_mem_sys_bytes metric has reduced:

On the other hand when I restart the Nginx, the anycable_go_clients_num and anycable_go_mem_sys_bytes metrics don’t change. So that forces me to think that anycable-go keeps outdated connections, not Nginx. Does it make sense?

I tried to tune OS keepalive settings as described in the official doc https://docs.anycable.io/#/v1/anycable-go/os_tuning?id=tcp-keepalive but nothing changes, anycable_go_clients_num metric shows a lot of clients.

I checked thatnet.ipv4.tcp_tw_recycle = 0and cat: /proc/sys/net/ipv4/tcp_tw_recycle: No such file or directory

I also tried the libkeepalive library (http://libkeepalive.sourceforge.net) as described in
https://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/#libkeepalive but with the same result - anycable_go_clients_num metric shows a lot of clients.

Could you advise please what may be wrong? I would be appreciated for any help

sponomarev changed the title ~~Websocket connections not closed~~ Websocket connections not closed using Kong load balancer Nov 8, 2019

palkan added the question label Nov 8, 2019

palkan added investigation required question and removed question labels Jan 14, 2020

palkan added stale and removed question labels Jun 10, 2020

palkan closed this as completed Jun 10, 2020

tak1n mentioned this issue Jul 22, 2020

RPC Connect nil anycable/anycable#105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Websocket connections not closed using Kong load balancer #83

Websocket connections not closed using Kong load balancer #83

landorg commented Nov 8, 2019 •

edited

Loading

palkan commented Nov 8, 2019

landorg commented Nov 11, 2019

le0pard commented Mar 11, 2020

landorg commented Apr 7, 2020

landorg commented Jul 22, 2020

le0pard commented Jul 22, 2020 •

edited

Loading

scalp42 commented Jan 19, 2021

le0pard commented Jan 19, 2021

scalp42 commented Jan 19, 2021

le0pard commented Jan 19, 2021

eneeyac commented Jan 25, 2021

Websocket connections not closed using Kong load balancer #83

Websocket connections not closed using Kong load balancer #83

Comments

landorg commented Nov 8, 2019 • edited Loading

Tell us about your environment

What did you do?

What did you expect to happen?

What actually happened?

palkan commented Nov 8, 2019

landorg commented Nov 11, 2019

le0pard commented Mar 11, 2020

landorg commented Apr 7, 2020

landorg commented Jul 22, 2020

le0pard commented Jul 22, 2020 • edited Loading

scalp42 commented Jan 19, 2021

le0pard commented Jan 19, 2021

scalp42 commented Jan 19, 2021

le0pard commented Jan 19, 2021

eneeyac commented Jan 25, 2021

landorg commented Nov 8, 2019 •

edited

Loading

le0pard commented Jul 22, 2020 •

edited

Loading