Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tcp-backlog config #1643

Closed
ice-cronus opened this issue Aug 4, 2023 · 22 comments · Fixed by #2456
Closed

add tcp-backlog config #1643

ice-cronus opened this issue Aug 4, 2023 · 22 comments · Fixed by #2456
Assignees
Labels
enhancement New feature or request minor nice to have enhancement Next Up task that is ready to be worked on and should be added to working queue

Comments

@ice-cronus
Copy link

ice-cronus commented Aug 4, 2023

Is your feature request related to a problem? Please describe.
Hi, guys! We're experiencing some issues with connectivity when running a lot of connections with high concurrency (using golang redis v9 client), errors like that

failed to get [...]: dial tcp 127.0.0.1:6379: i/o timeout

We're running high frequent queries with about ~800 concurrent connections.

I figured out that it seems issue is related to tcp buffer size, cuz when I run the same client code on default redis docker image with tuned tcp-backlog configuration I do not exprience such issues.

As I see in the code server is running using the default one of 128 buffer size, and lib has a function to adjust it, but it is never called from dragonfly server side.
Describe the solution you'd like
Some flag / configuration parameter should be added to adjust the tcp-backlog parameter.

Describe alternatives you've considered
Or make the default behaviour to be unbounded.

Additional context
Discord thread

@kostasrim
Copy link
Contributor

kostasrim commented Aug 4, 2023

Thank you, @ice-cronus for submitting an issue. @romange, @ice-cronus would like to work on this, any reason why not? (I remember Roy was cooking something, but my memory can deceive me)

@royjacobson
Copy link
Contributor

What I was working on is auto configuration of ulimit for open files, it's unrelated to the network stack :)

@ice-cronus
Copy link
Author

@kostasrim I would not like to work on that, I am unfamiliar with your codebase :) Could you, or somewho else handle that?

@kostasrim
Copy link
Contributor

@royjacobson Oh yes yes thanks for pointing out!

@ice-cronus if @romange approves I can make a quick PR. It's a few lines change

@romange
Copy link
Collaborator

romange commented Aug 4, 2023

Wow, slow down cowboys :) backlog is ONLY relevant when a new connection is initiated. Once the connection is established it does not affect anything. @ice-cronus does your app constantly tries to connect and disconnect? Is it possible to reproduce it on our side?

@kostasrim
Copy link
Contributor

You are late the party 🥳 (I know it's a non working day in Israel -- I am joking)

@romange I think the use case is that they initiate more than 128 connections at the same time, so probably the case is this flag puts an upper bound on the number of connections. And ofc it should, because it will call listen which is defined as:

293 /* Prepare to accept connections on socket FD.
294    N connection requests will be queued before further requests are refused.
295    Returns 0 on success, -1 for errors.  */
296 extern int listen (int __fd, int __n) __THROW;

So when the limit is reached, __n it will reject the connections.

Am I missing something?

@romange
Copy link
Collaborator

romange commented Aug 4, 2023

It's almost accurate. This article provides what i think is the complete picture: http://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html

basically, linux OS indeed maintains the backlog for the accept queue. But it does not mean that once the queue is full the incoming connections are rejected. In fact, you can hack echo_server to stop accepting the connections and see yourself. Linux OS in that case does not send back ACK and the client retries sending SYN packets again and again (usually upto 5min).
If during that time the accept queue is unloaded then the next SYN will be ACKED and the connection will be accepted.
this is why you can open with memtier_benchmark thousands of connections - in the worst case it will take more time for some connections to open.

@ice-cronus
Copy link
Author

@romange I just checked with netstat - and yes, I have a lot of SYN_SENT connections. And values are jumping (values below pickes in a few secs after each other), so it is hard to monitor.
And I am unsure what's the reason of connection re-creation, we're using the pool package, probably it is somewhere there, I'll try to understand what happens with those connections.

$ netstat -a | grep 6379 | grep SYN_SENT | wc -l
331
$ netstat -a | grep 6379 | grep ESTABLISHED | wc -l
749
$ netstat -a | grep 6379 | grep ESTABLISHED | wc -l
1007
$ netstat -a | grep 6379 | grep SYN_SENT | wc -l
357

@ice-cronus
Copy link
Author

yes, I checked it with debugger, it recreates connection from the pool here, but according to debugger isHealthyConn does not return false, so it seems there is no IDLE conns in the pool, and it tries to establish a new one. I'll play around pool size to check if problem would go with more connections in the pool

@ice-cronus
Copy link
Author

Well, I figured out why I had a lot of re-establishing / opening connections. I was using poolOpts.MaxIdleConnections = 1 and as a result - lib closes connection instead of marking it idle and reuse later.

@ice-cronus
Copy link
Author

ice-cronus commented Aug 4, 2023

@kostasrim Could you reopen and fix it anyway? Cuz redis has that one configurable and it is highly recommended to tune it for the high load

@kostasrim kostasrim reopened this Aug 4, 2023
@kostasrim
Copy link
Contributor

@romange it's your call :)

@romange
Copy link
Collaborator

romange commented Aug 4, 2023

@ice-cronus I am sorry but it's not recommended to tune tcp backlog for high load. In fact, it makes the backend more prone to connection storms. It's recommended to tune it for HTTP backends (lb/proxies) that need to handle huge amounts of accepts. I do not wish that Dragonfly users would waste their time tuning irrelevant options 🍺

@romange romange added enhancement New feature or request minor nice to have enhancement labels Aug 7, 2023
@romange romange changed the title tcp backlog is not configurable, it leads to i/o timeout errors with high throughput / concurrency add tcp-backlog config Aug 9, 2023
@ice-cronus
Copy link
Author

Any updates on this? We've updated pool settings mentioned above, but we're still getting those timeout errors from time to time. Probably, when concurrency and total number of connections is high there could be a chance of re-establishing > 128 connections

@romange
Copy link
Collaborator

romange commented Sep 18, 2023

@ice-cronus do you monitor dragonfly instances? if yes, could you share graphs of total_connections_received and instantaneous_ops_per_sec around the time you experience those timeouts. Also, I am curious to see the CPU load of the instance during these incidents.

@ice-cronus
Copy link
Author

Here is graph of dragonfly_connections_received_total, we got the issue around 0:11, and values are 3101982-3100070 = 1912 > 128, I think that's the reason why we're getting those errors.

dfly_conns

And I failed to find metric with instantaneous_ops_per_sec or something similar

@romange
Copy link
Collaborator

romange commented Sep 20, 2023

Ok, sounds good. Do you know why you have these disconnects in the first place? Could you provide INFO ALL output from that instance?

@ice-cronus
Copy link
Author

ice-cronus commented Sep 20, 2023

I am unsure yet why we're getting those connection re-establish cases. But connected_clients metric is +- the same over time, it is 1448, sometimes it increases like (up to) +10 and then goes back to that value.

Here is result of INFO ALL:

INFO ALL
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Server
redis_version:6.2.11
dfly_version:df-v1.6.2
redis_mode:standalone
arch_bits:64
multiplexing_api:iouring
tcp_port:6379
uptime_in_seconds:2512235
uptime_in_days:29

# Clients
connected_clients:1449
client_read_buf_capacity:28684544
blocked_clients:0

# Memory
used_memory:12720191536
used_memory_human:11.85GiB
used_memory_peak:12720645304
comitted_memory:-1120105296160
used_memory_rss:19089059840
used_memory_rss_human:17.78GiB
object_used_memory:10009989304801
table_used_memory:2151049008
num_buckets:40825680
num_entries:37334314
inline_keys:21411245
strval_bytes:191391928
updateval_amount:10172386230460
listpack_blobs:871
listpack_bytes:354124
small_string_bytes:708661960
pipeline_cache_bytes:235417156
maxmemory:268435456000
maxmemory_human:250.00GiB
cache_mode:store
maxmemory_policy:noeviction

# Stats
total_connections_received:3148610
total_commands_processed:1615529695622
instantaneous_ops_per_sec:659942
total_pipelined_commands:1608255975356
total_net_input_bytes:1163484033688993
total_net_output_bytes:379361264694750
instantaneous_input_kbps:-1
instantaneous_output_kbps:-1
rejected_connections:-1
expired_keys:851878734
evicted_keys:0
hard_evictions:0
garbage_checked:218869167
garbage_collected:188089573
bump_ups:0
stash_unloaded:179170
oom_rejections:0
traverse_ttl_sec:0
delete_ttl_sec:0
keyspace_hits:1029139371082
keyspace_misses:175574394678
total_reads_processed:47486081831
total_writes_processed:372819118247
async_writes_count:0
defrag_attempt_total:0
defrag_realloc_total:0
defrag_task_invocation_total:0

# TIERED
tiered_entries:0
tiered_bytes:0
tiered_reads:0
tiered_writes:0
tiered_reserved:0
tiered_capacity:0
tiered_aborted_write_total:0
tiered_flush_skip_total:0

# PERSISTENCE
last_save:1695208814
last_save_duration_sec:14
last_save_file:XXXX
loading:0
rdb_hash:3851807
rdb_set:20773514
rdb_other:2922
rdb_string:12703867
rdb_changes_since_last_save:29117824

# Replication
role:master
connected_slaves:1
slave0:ip=XXXXX
master_replid:XXXXX

# Commandstats
cmd_AUTH:3220128
cmd_CLIENT:145285
cmd_DECR:2140
cmd_DEL:14074
cmd_DFLY:292
cmd_EVALSHA:1376106
cmd_EXEC:54787850
cmd_GET:3466267100
cmd_HELLO:145884
cmd_HGET:1
cmd_HGETALL:1
cmd_HINCRBY:106259537
cmd_HINCRBYFLOAT:41519
cmd_HLEN:686958
cmd_HMGET:1197018221699
cmd_HSET:412024498850
cmd_HSETNX:2757394
cmd_INCR:967762172
cmd_INFO:4
cmd_MULTI:54787851
cmd_PING:301054649
cmd_QUIT:518
cmd_REPLCONF:404909075
cmd_ROLE:2932836
cmd_SADD:12397582
cmd_SET:673780799
cmd_SETEX:186300646
cmd_SETNX:689437
cmd_SREM:468851
cmd_SSCAN:248253
cmd_ZADD:656785
cmd_ZREM:13748
cmd_ZREVRANGEBYSCORE:4949481
cmd_ZREVRANK:202258524

# Errorstats
[2]race condition:66
[1]race condition:2140
EXEC without MULTI:1
syntax_error:1
protocol_error:4
syncid not found:47
Can not execute during LOADING:38030460

# Keyspace
db0:keys=37334314,expires=8850689,avg_ttl=-1

# CPU
used_cpu_sys:20515437.68552
used_cpu_user:53587176.902686
used_cpu_sys_children:0.0
used_cpu_user_children:0.0
used_cpu_sys_main_thread:398022.5938
used_cpu_user_main_thread:1095559.629527

# Cluster
cluster_enabled:0

@romange
Copy link
Collaborator

romange commented Jan 21, 2024

Some updates on the issue. in case hundreds or more connections are created at the same time, linux moves first backlog established connections into the accept queue. All others are put into special SYN queue https://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html For these connections, the linux server does not send SYN-ACK responses (i.e. silently ignores them), which triggers the re-transmit flow on the client side. The re-transmit flow has exponential back-off with the initial value being hardcoded in linux to 1 second, see https://serverfault.com/questions/1043758/initial-syn-timeout-on-linux

So, if our backlog is 128 and more than 128 clients try to connect and Dragonfly has not immediately unloaded them from the accept queue, then some of the connections will find themselves in the SYN queue which will automatically prolong the connect time for them to at least 1 second.

@romange romange added the Next Up task that is ready to be worked on and should be added to working queue label Jan 21, 2024
@chakaz
Copy link
Collaborator

chakaz commented Jan 22, 2024

@romange do I read your last comment correctly in that you are OK with implementing backlog as a configurable flag?

@romange
Copy link
Collaborator

romange commented Jan 22, 2024

Yes :) now we understand the context and why this flag can be useful.

@chakaz
Copy link
Collaborator

chakaz commented Jan 22, 2024

You had me at "some of the connections will [...] prolong the connect time for them to at least 1 second" 🤣

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request minor nice to have enhancement Next Up task that is ready to be worked on and should be added to working queue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants