Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evconnlistener_new() fails #81

Open
mdavidsaver opened this issue Aug 14, 2024 · 4 comments
Open

evconnlistener_new() fails #81

mdavidsaver opened this issue Aug 14, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@mdavidsaver
Copy link
Member

mdavidsaver commented Aug 14, 2024

Observed on a Debian 12 host with an IOC, one of 34 PVA servers started in parallel during system boot. A later restart succeeds.

...: Error in pvxsBaseRegistrar : bad_alloc serverconn.cpp:448

pvxs/src/serverconn.cpp

Lines 447 to 449 in 5fa743d

const int backlog = 4;
listener = evlisten(__FILE__, __LINE__,
evconnlistener_new(server->acceptor_loop.base, onConnS, this, LEV_OPT_DISABLED|LEV_OPT_CLOSE_ON_EXEC, backlog, sock.sock));

evconnlistener_new() can fail from malloc() (seems unlikely) or listen(). The later seems the most likely. By this point, the socket is already bound, which I thought was the only reason listen() could fail (aside from alloc...).

Maybe bind() on Linux is lazier than I thought?

ev_owned_ptr<> should be changed to capture SOCKERRNO.

@mdavidsaver mdavidsaver added the bug Something isn't working label Aug 14, 2024
@mdavidsaver mdavidsaver self-assigned this Aug 14, 2024
@mdavidsaver
Copy link
Member Author

Well, surprise. listen() for ipv4 on Linux is lazier than I expected. And there is a confirmed and documented race condition...

from inet_csk_listen_start()

	/* There is race window here: we announce ourselves listening,
	 * but this transition is still not validated by get_port().
	 * It is OK, because this socket enters to hash table only
	 * after validation is complete.
	 */

https://github.com/torvalds/linux/blob/9d5906799f7d89c9e12f6d2e0fccb00713c945ab/net/ipv4/inet_connection_sock.c#L1343-L1347

@mdavidsaver
Copy link
Member Author

In ServIface::ServIface() this could be handled by moving the evconnlistener_new() into the TCP port selection loop. Hopefully I can figure out a way to trigger this situation as some details need further exploration. eg. according to this post both racers can fail.

@mdavidsaver
Copy link
Member Author

Now I think I understand why RSRV tests both bind() and listen() at once...

            if(bind(tcpsock, &scratch.sa, sizeof(scratch))==0 && listen(tcpsock, 20)==0) {

https://github.com/epics-base/epics-base/blob/86cdfc596f402657401b07c3a38493159022d023/modules/database/src/ioc/rsrv/caservertask.c#L195

@mdavidsaver
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant