lwip_sock_tcp / sock_async: received events before calling sock_accept() are lost due to race condition. #16303
Labels
Area: network
Area: Networking
Type: bug
The issue reports a bug / The PR fixes a bug (including spelling errors)
Description
In the
tests/lwip
application,tcp.c
defines a TCP server that receives a connection, dumps all the bytes on the terminal as hex and prints messages when the connection is accepted and closed.The normal dynamic of the test would be that
_tcp_accept
is called when a new connection in the server socket is received which callssock_tcp_accept()
to accept the connection.sock_tcp_accept
returns a socket number (file descriptor) to use withtcp_sock_recv()
for example, which is only assigned to the underlying connection whensock_tcp_accept()
.However, underneath the
netconn
connection has already been accepted before sock_tcp_accept() is called, the SYN,ACK packet was already sent and the other end may have already sent bytes to this connection. All these events call the_netconn_cb
function inpkg/lwip/contrib/sock/tcp/lwip_sock.c
but theconn->callback_arg.socket
for the givennetconn* conn
is set to -1 untilsock_tcp_accept()
is called and a socket value is assigned. The code in_netconn_cb
ignores the callback altogether if the socket is not assigned, which means that both the received data and the FIN are ignored if they happen before the application code has a chance to both callsock_tcp_accept()
AND register the recv callback withsock_tcp_event_init(_ev_queue)
.The easiest way to see this problem is by connecting to the a TCP server in the
test/lwip
application and sending data immediately, (for exampleecho hello | nc 1.2.3.4 12341
to a TCP server created withtcp server start 1234
). The server should print the connected message, the hex dump of hello and the reset message; but instead it only prints the connected message. See below for a programmatic way to reproduce it.I'd propose two fix this issue with by computing the
flags
variable in_netconn_cb
even if socket is -1 and even if there's no callback registered, and storing a mask of "pending" flags somewhere in the struct conn* object that would call the tcp callback when registered withsock_tcp_event_init
. Otherwise I don't see a way to use the API without potentially missing events.Steps to reproduce the issue
Add the following test to
lwip/tests/01-run.py
and to list of tests at the bottom of the file. Run it with:RIOT_CI_BUILD=1 make QUIET=0 -C tests/lwip flash test
Expected results
Test passes. Namely,
_tcp_recv
is called when either sock.send() or sock.close() are called.Actual results
The example lwip app never calls
_tcp_recv
so it never prints "00000000 61 62 63 64" on the terminal.Versions
The text was updated successfully, but these errors were encountered: