Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VMA TCP connect() call takes much longer than OS #1017

Open
Fed3n opened this issue Dec 1, 2022 · 4 comments
Open

VMA TCP connect() call takes much longer than OS #1017

Fed3n opened this issue Dec 1, 2022 · 4 comments

Comments

@Fed3n
Copy link

Fed3n commented Dec 1, 2022

VMA TCP connect() call takes much longer than OS

Configuration:

  • Product version 9.6.4-0 (built from source)
  • OS Debian 11 x86_64
  • OFED 5.8-1.0.1.1
  • Hardware Mellanox ConnectX-5 Ex

I'm testing VMA flow completion time for a TCP flow against the OS stack and I notice that a blocking connect() call on VMA takes about 1.5-2ms while the same measurement on the OS stack is about 40us. VMA is run on both hosts with VMA_SPEC=latency and compiled with --enable-tso.
To see where the bottleneck is, I did some measurements inside the VMA stack and see that in the connect() path the sockinfo_tcp::prepare_dst_to_send and sockinfo::attach_as_uc_receiver take 1.5-2ms combined, while the lwip tcp_connect call after that only takes around 20us. send/recv delays once the connect is done are then much lower than on OS stack.
Is this setup time for a new connection a known limitation of VMA or might there be something wrong with my setup?

@igor-ivanov
Copy link
Collaborator

@Fed3n thank you for your analysis. Could you check connect() operation duration to another server w/o closing the first one. I guess that connect() duration might be long just for the first time.

@igor-ivanov
Copy link
Collaborator

@Fed3n have you had a chance to verify my assumption?

@Fed3n
Copy link
Author

Fed3n commented Jan 20, 2023

@igor-ivanov sorry this slipped my mind.
I made a simple experiment with 3 servers, with one server running a client application that sends 4 alternating connects to the other two running an accepting application. No connection is closed. Logging is done to file so hopefully is not too influential in the measurement of internal functions.

Using VMA on all servers as described above:

===SERVER1 CONN1===
prepare_to_send: 9087475ns
attach_as_uc_receiver: 17735853ns
Total connect duration:26930693ns

===SERVER2 CONN1===
prepare_to_send: 11602ns
attach_as_uc_receiver: 1676412ns
Total connect duration:1744222ns

===SERVER1 CONN2===
prepare_to_send: 3937ns
attach_as_uc_receiver: 343114ns
Total connect duration:376918ns

===SERVER2 CONN2===
prepare_to_send: 3256ns
attach_as_uc_receiver: 209800ns
Total connect duration:240208ns

Using OS Stack on all servers:

===SERVER1 CONN1===
Total connect duration:51556ns

===SERVER2 CONN1===
Total connect duration:48811ns

===SERVER1 CONN2===
Total connect duration:49933ns

===SERVER2 CONN2===
Total connect duration:42788ns

You are absolutely right that only the very first connect takes a long time. Regardless, the attach_as_uc_receiver call seems to be bottleneck even in later calls...

@igor-ivanov
Copy link
Collaborator

Thank you, @Fed3n on the first connection ring related resources are initialized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants