-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add tcp-backlog config #1643
Comments
Thank you, @ice-cronus for submitting an issue. @romange, @ice-cronus would like to work on this, any reason why not? (I remember Roy was cooking something, but my memory can deceive me) |
What I was working on is auto configuration of |
@kostasrim I would not like to work on that, I am unfamiliar with your codebase :) Could you, or somewho else handle that? |
@royjacobson Oh yes yes thanks for pointing out! @ice-cronus if @romange approves I can make a quick PR. It's a few lines change |
Wow, slow down cowboys :) |
You are late the party 🥳 (I know it's a non working day in Israel -- I am joking) @romange I think the use case is that they initiate more than 128 connections at the same time, so probably the case is this flag puts an upper bound on the number of connections. And ofc it should, because it will call
So when the limit is reached, Am I missing something? |
It's almost accurate. This article provides what i think is the complete picture: http://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html basically, linux OS indeed maintains the backlog for the accept queue. But it does not mean that once the queue is full the incoming connections are rejected. In fact, you can hack |
@romange I just checked with netstat - and yes, I have a lot of SYN_SENT connections. And values are jumping (values below pickes in a few secs after each other), so it is hard to monitor.
|
yes, I checked it with debugger, it recreates connection from the pool here, but according to debugger isHealthyConn does not return false, so it seems there is no IDLE conns in the pool, and it tries to establish a new one. I'll play around pool size to check if problem would go with more connections in the pool |
Well, I figured out why I had a lot of re-establishing / opening connections. I was using poolOpts.MaxIdleConnections = 1 and as a result - lib closes connection instead of marking it idle and reuse later. |
@kostasrim Could you reopen and fix it anyway? Cuz redis has that one configurable and it is highly recommended to tune it for the high load |
@romange it's your call :) |
@ice-cronus I am sorry but it's not recommended to tune tcp backlog for high load. In fact, it makes the backend more prone to connection storms. It's recommended to tune it for HTTP backends (lb/proxies) that need to handle huge amounts of accepts. I do not wish that Dragonfly users would waste their time tuning irrelevant options 🍺 |
Any updates on this? We've updated pool settings mentioned above, but we're still getting those timeout errors from time to time. Probably, when concurrency and total number of connections is high there could be a chance of re-establishing > 128 connections |
@ice-cronus do you monitor dragonfly instances? if yes, could you share graphs of |
Ok, sounds good. Do you know why you have these disconnects in the first place? Could you provide |
I am unsure yet why we're getting those connection re-establish cases. But connected_clients metric is +- the same over time, it is 1448, sometimes it increases like (up to) +10 and then goes back to that value. Here is result of
|
Some updates on the issue. in case hundreds or more connections are created at the same time, linux moves first So, if our backlog is 128 and more than 128 clients try to connect and Dragonfly has not immediately unloaded them from the accept queue, then some of the connections will find themselves in the SYN queue which will automatically prolong the connect time for them to at least 1 second. |
@romange do I read your last comment correctly in that you are OK with implementing |
Yes :) now we understand the context and why this flag can be useful. |
You had me at "some of the connections will [...] prolong the connect time for them to at least 1 second" 🤣 |
Is your feature request related to a problem? Please describe.
Hi, guys! We're experiencing some issues with connectivity when running a lot of connections with high concurrency (using golang redis v9 client), errors like that
We're running high frequent queries with about ~800 concurrent connections.
I figured out that it seems issue is related to tcp buffer size, cuz when I run the same client code on default redis docker image with tuned tcp-backlog configuration I do not exprience such issues.
As I see in the code server is running using the default one of 128 buffer size, and lib has a function to adjust it, but it is never called from dragonfly server side.
Describe the solution you'd like
Some flag / configuration parameter should be added to adjust the tcp-backlog parameter.
Describe alternatives you've considered
Or make the default behaviour to be unbounded.
Additional context
Discord thread
The text was updated successfully, but these errors were encountered: