add tcp-backlog config #1643

ice-cronus · 2023-08-04T08:51:04Z

Is your feature request related to a problem? Please describe.
Hi, guys! We're experiencing some issues with connectivity when running a lot of connections with high concurrency (using golang redis v9 client), errors like that

failed to get [...]: dial tcp 127.0.0.1:6379: i/o timeout

We're running high frequent queries with about ~800 concurrent connections.

I figured out that it seems issue is related to tcp buffer size, cuz when I run the same client code on default redis docker image with tuned tcp-backlog configuration I do not exprience such issues.

As I see in the code server is running using the default one of 128 buffer size, and lib has a function to adjust it, but it is never called from dragonfly server side.
Describe the solution you'd like
Some flag / configuration parameter should be added to adjust the tcp-backlog parameter.

Describe alternatives you've considered
Or make the default behaviour to be unbounded.

Additional context
Discord thread

The text was updated successfully, but these errors were encountered:

kostasrim · 2023-08-04T09:04:54Z

Thank you, @ice-cronus for submitting an issue. @romange, @ice-cronus would like to work on this, any reason why not? (I remember Roy was cooking something, but my memory can deceive me)

royjacobson · 2023-08-04T09:22:18Z

What I was working on is auto configuration of ulimit for open files, it's unrelated to the network stack :)

ice-cronus · 2023-08-04T09:36:57Z

@kostasrim I would not like to work on that, I am unfamiliar with your codebase :) Could you, or somewho else handle that?

kostasrim · 2023-08-04T09:39:11Z

@royjacobson Oh yes yes thanks for pointing out!

@ice-cronus if @romange approves I can make a quick PR. It's a few lines change

romange · 2023-08-04T10:17:08Z

Wow, slow down cowboys :) backlog is ONLY relevant when a new connection is initiated. Once the connection is established it does not affect anything. @ice-cronus does your app constantly tries to connect and disconnect? Is it possible to reproduce it on our side?

kostasrim · 2023-08-04T10:30:02Z

You are late the party 🥳 (I know it's a non working day in Israel -- I am joking)

@romange I think the use case is that they initiate more than 128 connections at the same time, so probably the case is this flag puts an upper bound on the number of connections. And ofc it should, because it will call listen which is defined as:

293 /* Prepare to accept connections on socket FD.
294    N connection requests will be queued before further requests are refused.
295    Returns 0 on success, -1 for errors.  */
296 extern int listen (int __fd, int __n) __THROW;

So when the limit is reached, __n it will reject the connections.

Am I missing something?

romange · 2023-08-04T10:42:37Z

It's almost accurate. This article provides what i think is the complete picture: http://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html

basically, linux OS indeed maintains the backlog for the accept queue. But it does not mean that once the queue is full the incoming connections are rejected. In fact, you can hack echo_server to stop accepting the connections and see yourself. Linux OS in that case does not send back ACK and the client retries sending SYN packets again and again (usually upto 5min).
If during that time the accept queue is unloaded then the next SYN will be ACKED and the connection will be accepted.
this is why you can open with memtier_benchmark thousands of connections - in the worst case it will take more time for some connections to open.

ice-cronus · 2023-08-04T11:03:43Z

@romange I just checked with netstat - and yes, I have a lot of SYN_SENT connections. And values are jumping (values below pickes in a few secs after each other), so it is hard to monitor.
And I am unsure what's the reason of connection re-creation, we're using the pool package, probably it is somewhere there, I'll try to understand what happens with those connections.

$ netstat -a | grep 6379 | grep SYN_SENT | wc -l
331
$ netstat -a | grep 6379 | grep ESTABLISHED | wc -l
749
$ netstat -a | grep 6379 | grep ESTABLISHED | wc -l
1007
$ netstat -a | grep 6379 | grep SYN_SENT | wc -l
357

ice-cronus · 2023-08-04T11:20:59Z

yes, I checked it with debugger, it recreates connection from the pool here, but according to debugger isHealthyConn does not return false, so it seems there is no IDLE conns in the pool, and it tries to establish a new one. I'll play around pool size to check if problem would go with more connections in the pool

ice-cronus · 2023-08-04T12:10:26Z

Well, I figured out why I had a lot of re-establishing / opening connections. I was using poolOpts.MaxIdleConnections = 1 and as a result - lib closes connection instead of marking it idle and reuse later.

ice-cronus · 2023-08-04T12:35:54Z

@kostasrim Could you reopen and fix it anyway? Cuz redis has that one configurable and it is highly recommended to tune it for the high load

kostasrim · 2023-08-04T13:18:43Z

@romange it's your call :)

romange · 2023-08-04T18:15:41Z

@ice-cronus I am sorry but it's not recommended to tune tcp backlog for high load. In fact, it makes the backend more prone to connection storms. It's recommended to tune it for HTTP backends (lb/proxies) that need to handle huge amounts of accepts. I do not wish that Dragonfly users would waste their time tuning irrelevant options 🍺

ice-cronus · 2023-09-18T05:10:03Z

Any updates on this? We've updated pool settings mentioned above, but we're still getting those timeout errors from time to time. Probably, when concurrency and total number of connections is high there could be a chance of re-establishing > 128 connections

romange · 2023-09-18T10:45:45Z

@ice-cronus do you monitor dragonfly instances? if yes, could you share graphs of total_connections_received and instantaneous_ops_per_sec around the time you experience those timeouts. Also, I am curious to see the CPU load of the instance during these incidents.

ice-cronus · 2023-09-20T07:53:50Z

Here is graph of dragonfly_connections_received_total, we got the issue around 0:11, and values are 3101982-3100070 = 1912 > 128, I think that's the reason why we're getting those errors.

And I failed to find metric with instantaneous_ops_per_sec or something similar

romange · 2023-09-20T09:32:33Z

Ok, sounds good. Do you know why you have these disconnects in the first place? Could you provide INFO ALL output from that instance?

ice-cronus · 2023-09-20T11:36:57Z

I am unsure yet why we're getting those connection re-establish cases. But connected_clients metric is +- the same over time, it is 1448, sometimes it increases like (up to) +10 and then goes back to that value.

Here is result of INFO ALL:

INFO ALL
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Server
redis_version:6.2.11
dfly_version:df-v1.6.2
redis_mode:standalone
arch_bits:64
multiplexing_api:iouring
tcp_port:6379
uptime_in_seconds:2512235
uptime_in_days:29

# Clients
connected_clients:1449
client_read_buf_capacity:28684544
blocked_clients:0

# Memory
used_memory:12720191536
used_memory_human:11.85GiB
used_memory_peak:12720645304
comitted_memory:-1120105296160
used_memory_rss:19089059840
used_memory_rss_human:17.78GiB
object_used_memory:10009989304801
table_used_memory:2151049008
num_buckets:40825680
num_entries:37334314
inline_keys:21411245
strval_bytes:191391928
updateval_amount:10172386230460
listpack_blobs:871
listpack_bytes:354124
small_string_bytes:708661960
pipeline_cache_bytes:235417156
maxmemory:268435456000
maxmemory_human:250.00GiB
cache_mode:store
maxmemory_policy:noeviction

# Stats
total_connections_received:3148610
total_commands_processed:1615529695622
instantaneous_ops_per_sec:659942
total_pipelined_commands:1608255975356
total_net_input_bytes:1163484033688993
total_net_output_bytes:379361264694750
instantaneous_input_kbps:-1
instantaneous_output_kbps:-1
rejected_connections:-1
expired_keys:851878734
evicted_keys:0
hard_evictions:0
garbage_checked:218869167
garbage_collected:188089573
bump_ups:0
stash_unloaded:179170
oom_rejections:0
traverse_ttl_sec:0
delete_ttl_sec:0
keyspace_hits:1029139371082
keyspace_misses:175574394678
total_reads_processed:47486081831
total_writes_processed:372819118247
async_writes_count:0
defrag_attempt_total:0
defrag_realloc_total:0
defrag_task_invocation_total:0

# TIERED
tiered_entries:0
tiered_bytes:0
tiered_reads:0
tiered_writes:0
tiered_reserved:0
tiered_capacity:0
tiered_aborted_write_total:0
tiered_flush_skip_total:0

# PERSISTENCE
last_save:1695208814
last_save_duration_sec:14
last_save_file:XXXX
loading:0
rdb_hash:3851807
rdb_set:20773514
rdb_other:2922
rdb_string:12703867
rdb_changes_since_last_save:29117824

# Replication
role:master
connected_slaves:1
slave0:ip=XXXXX
master_replid:XXXXX

# Commandstats
cmd_AUTH:3220128
cmd_CLIENT:145285
cmd_DECR:2140
cmd_DEL:14074
cmd_DFLY:292
cmd_EVALSHA:1376106
cmd_EXEC:54787850
cmd_GET:3466267100
cmd_HELLO:145884
cmd_HGET:1
cmd_HGETALL:1
cmd_HINCRBY:106259537
cmd_HINCRBYFLOAT:41519
cmd_HLEN:686958
cmd_HMGET:1197018221699
cmd_HSET:412024498850
cmd_HSETNX:2757394
cmd_INCR:967762172
cmd_INFO:4
cmd_MULTI:54787851
cmd_PING:301054649
cmd_QUIT:518
cmd_REPLCONF:404909075
cmd_ROLE:2932836
cmd_SADD:12397582
cmd_SET:673780799
cmd_SETEX:186300646
cmd_SETNX:689437
cmd_SREM:468851
cmd_SSCAN:248253
cmd_ZADD:656785
cmd_ZREM:13748
cmd_ZREVRANGEBYSCORE:4949481
cmd_ZREVRANK:202258524

# Errorstats
[2]race condition:66
[1]race condition:2140
EXEC without MULTI:1
syntax_error:1
protocol_error:4
syncid not found:47
Can not execute during LOADING:38030460

# Keyspace
db0:keys=37334314,expires=8850689,avg_ttl=-1

# CPU
used_cpu_sys:20515437.68552
used_cpu_user:53587176.902686
used_cpu_sys_children:0.0
used_cpu_user_children:0.0
used_cpu_sys_main_thread:398022.5938
used_cpu_user_main_thread:1095559.629527

# Cluster
cluster_enabled:0

romange · 2024-01-21T08:01:43Z

Some updates on the issue. in case hundreds or more connections are created at the same time, linux moves first backlog established connections into the accept queue. All others are put into special SYN queue https://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html For these connections, the linux server does not send SYN-ACK responses (i.e. silently ignores them), which triggers the re-transmit flow on the client side. The re-transmit flow has exponential back-off with the initial value being hardcoded in linux to 1 second, see https://serverfault.com/questions/1043758/initial-syn-timeout-on-linux

So, if our backlog is 128 and more than 128 clients try to connect and Dragonfly has not immediately unloaded them from the accept queue, then some of the connections will find themselves in the SYN queue which will automatically prolong the connect time for them to at least 1 second.

chakaz · 2024-01-22T19:13:32Z

@romange do I read your last comment correctly in that you are OK with implementing backlog as a configurable flag?

romange · 2024-01-22T19:39:25Z

Yes :) now we understand the context and why this flag can be useful.

chakaz · 2024-01-22T20:02:35Z

You had me at "some of the connections will [...] prolong the connect time for them to at least 1 second" 🤣

kostasrim closed this as completed Aug 4, 2023

kostasrim reopened this Aug 4, 2023

romange added enhancement minor labels Aug 7, 2023

romange changed the title ~~tcp backlog is not configurable, it leads to i/o timeout errors with high throughput / concurrency~~ add tcp-backlog config Aug 9, 2023

romange added the Next Up label Jan 21, 2024

adiholden assigned chakaz Jan 21, 2024

chakaz mentioned this issue Jan 22, 2024

feat(server): Add backlog flag for listen() #2456

Merged

romange closed this as completed in #2456 Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add tcp-backlog config #1643

add tcp-backlog config #1643

ice-cronus commented Aug 4, 2023 •

edited

Loading

kostasrim commented Aug 4, 2023 •

edited

Loading

royjacobson commented Aug 4, 2023

ice-cronus commented Aug 4, 2023

kostasrim commented Aug 4, 2023

romange commented Aug 4, 2023

kostasrim commented Aug 4, 2023

romange commented Aug 4, 2023

ice-cronus commented Aug 4, 2023

ice-cronus commented Aug 4, 2023

ice-cronus commented Aug 4, 2023

ice-cronus commented Aug 4, 2023 •

edited

Loading

kostasrim commented Aug 4, 2023

romange commented Aug 4, 2023

ice-cronus commented Sep 18, 2023

romange commented Sep 18, 2023

ice-cronus commented Sep 20, 2023

romange commented Sep 20, 2023

ice-cronus commented Sep 20, 2023 •

edited

Loading

romange commented Jan 21, 2024

chakaz commented Jan 22, 2024

romange commented Jan 22, 2024

chakaz commented Jan 22, 2024

add tcp-backlog config #1643

add tcp-backlog config #1643

Comments

ice-cronus commented Aug 4, 2023 • edited Loading

kostasrim commented Aug 4, 2023 • edited Loading

royjacobson commented Aug 4, 2023

ice-cronus commented Aug 4, 2023

kostasrim commented Aug 4, 2023

romange commented Aug 4, 2023

kostasrim commented Aug 4, 2023

romange commented Aug 4, 2023

ice-cronus commented Aug 4, 2023

ice-cronus commented Aug 4, 2023

ice-cronus commented Aug 4, 2023

ice-cronus commented Aug 4, 2023 • edited Loading

kostasrim commented Aug 4, 2023

romange commented Aug 4, 2023

ice-cronus commented Sep 18, 2023

romange commented Sep 18, 2023

ice-cronus commented Sep 20, 2023

romange commented Sep 20, 2023

ice-cronus commented Sep 20, 2023 • edited Loading

romange commented Jan 21, 2024

chakaz commented Jan 22, 2024

romange commented Jan 22, 2024

chakaz commented Jan 22, 2024

ice-cronus commented Aug 4, 2023 •

edited

Loading

kostasrim commented Aug 4, 2023 •

edited

Loading

ice-cronus commented Aug 4, 2023 •

edited

Loading

ice-cronus commented Sep 20, 2023 •

edited

Loading