Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failures: stream_engine.cpp:94 and Invalid argument (ip.cpp:136) libzmq4x on OSX #1442

Closed
ukeller opened this issue Jun 16, 2015 · 1 comment · Fixed by #2292
Closed

Comments

@ukeller
Copy link

ukeller commented Jun 16, 2015

Hey
I'm getting hit sporadically by this assertion on libzmq4x on OSX.
Looking at the code there is the following:

int set = 1;
rc = setsockopt (s, SOL_SOCKET, SO_NOSIGPIPE, &set, sizeof (int));
errno_assert (rc == 0);

It seems that this can fail on OSX if the remote end has closed/reset the socket as described here:
http://lists.apple.com/archives/macnetworkprog/2008/Jan/msg00084.html

To illustrate this for OSX I created a small test program (single thread no zmq):
https://gist.github.com/ukeller/33734f95b00e36c0e404
It creates a server socket and a connector socket. The connector socket connects to the server socket over network. The connector socket is then closed after accept returns "on the server side" and before setsockopt is called "on the server side". Effectively simulating a "client" that disconnects immediately after connecting and triggering the assertion.

To test save the GIST under socket.cc and compile with:

clang++ -std=c++1y socket.cc  -stdlib=libc++ 
 ./a.out nolinger stream_engine.cpp  #PASSES
 ./a.out linger0 stream_engine.cpp  #FAILS
 ./a.out nolinger ip.cpp  #PASSES
 ./a.out linger0 ip.cpp  #FAILS

Not sure what the fix is here. I'm tempted to say that the assertion should be disabled at least for Apple platforms. Any opinions?
Regards,
Urs

@ukeller ukeller changed the title Assertion failure in stream_engine.cpp:94 libzmq4x on OSX Assertion failures: stream_engine.cpp:94 and Invalid argument (ip.cpp:136) libzmq4x on OSX Jun 16, 2015
@ukeller
Copy link
Author

ukeller commented Jun 16, 2015

It seems that the same situation causes an assertion of Invalid argument (ip.cpp:136), this is however fixed on 4.1 master.

bluca added a commit to bluca/libzmq that referenced this issue Jan 3, 2017
Solution: setsockopt returns EINVAL if the connection was closed by
the peer after the accept returned a valid socket. This is a valid
network error and should not cause an assert.
To handle this we have to extract the setsockopt from the stream
engine, as there's no clean way to return an error from the
constructor. Instead, try to set this option before creating the
engine in the callers, and return immediately as if the accept
had failed to avoid churn. Do the same for the connect calls.
Since this has to be done in 5 places (tcp/ipc connecter/listener and
socks connecter) add an utility function in ip.cpp.
Fixes zeromq#1442
bluca added a commit to bluca/libzmq that referenced this issue Jan 3, 2017
Solution: setsockopt returns EINVAL if the connection was closed by
the peer after the accept returned a valid socket. This is a valid
network error and should not cause an assert.
To handle this we have to extract the setsockopt from the stream
engine, as there's no clean way to return an error from the
constructor. Instead, try to set this option before creating the
engine in the callers, and return immediately as if the accept
had failed to avoid churn. Do the same for the connect calls.
Since this has to be done in 5 places (tcp/ipc connecter/listener and
socks connecter) add an utility function in ip.cpp.
Fixes zeromq#1442
bluca added a commit to bluca/libzmq that referenced this issue Jan 4, 2017
Solution: setsockopt returns EINVAL if the connection was closed by
the peer after the accept returned a valid socket. This is a valid
network error and should not cause an assert.
To handle this we have to extract the setsockopt from the stream
engine, as there's no clean way to return an error from the
constructor. Instead, try to set this option before creating the
engine in the callers, and return immediately as if the accept
had failed to avoid churn. Do the same for the connect calls by
setting the option in open_socket, so that the option for that
case is set even before connecting, so there's no possible race
condition.
Since this has to be done in 4 places (tcp/ipc listener, socks
connecter and open_socket) add an utility function in ip.cpp.
Fixes zeromq#1442
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant