-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: zmq::signaler_t::send may loop forever #2362
Conversation
Solution: restore the wsa_assert statement previously removed.
@@ -189,6 +189,7 @@ void zmq::signaler_t::send () | |||
unsigned char dummy = 0; | |||
while (true) { | |||
int nbytes = ::send (w, (char*) &dummy, sizeof (dummy), 0); | |||
wsa_assert (nbytes != SOCKET_ERROR); | |||
if (unlikely (nbytes == SOCKET_ERROR)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the assert above, can nbytes be ever == SOCKET_ERROR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not so sure it won't be. I am watching this in my project, things go pretty well so far. Without this patch, zmq_assert
fails very often in my program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the previous pull request should be reverted.
version 4.1.5 had this code
unsigned char dummy = 0;
int nbytes = ::send (w, (char*) &dummy, sizeof (dummy), 0);
wsa_assert (nbytes != SOCKET_ERROR);
zmq_asserwsa_assertt (nbytes == sizeof (dummy));
to me it seems that the problem from at least one of the the reported chrash (#2358) was this
zmq_assert (nbytes == sizeof (dummy));
and not the wsa_assert
the other bugs mentioned in #2360 , at least one, are from a different platform (ubuntu, not windows)
since this code is not the same on both, I think the problem is somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems strange, but ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look at this, but to me it seems that both, send on windows and write on Linux can return without an error, but send less bytes the the expected bytes to send (8 on Linux 1 on Windows)
on Windows this is funny because it basically means that it can send 0 bytes without problem :-)
but it seems to be possible, otherwise there would not be those reports.
but
#2339 reports 4.2.0 line 185, that is windows code, but reported on ubuntu !
so either the reported version, or platform is wrong
#2303 reports no line number, we do not know which of those is the problem
wsa_assert (nbytes != SOCKET_ERROR);
zmq_asserwsa_assertt (nbytes == sizeof (dummy));
#2358 reports the line number and the version 4.1.5
zmq::signaler_t::send() Line 182 + 0x49 bytes
https://github.com/zeromq/zeromq4-1/blob/v4.1.5/src/signaler.cpp#L182
what is for me clearly this line
zmq_assert (nbytes == sizeof (dummy));
so I think the correct solution would be to handle the situation if less bytes than expected have been sent correct.
currently it is asserted that always the requested amount of bytes are sent, but this is clearly not the case
I might create an issue out of this later in the evening, or tomorrow, if no one else has done it meanwhile.
@@ -189,6 +189,7 @@ void zmq::signaler_t::send () | |||
unsigned char dummy = 0; | |||
while (true) { | |||
int nbytes = ::send (w, (char*) &dummy, sizeof (dummy), 0); | |||
wsa_assert (nbytes != SOCKET_ERROR); | |||
if (unlikely (nbytes == SOCKET_ERROR)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the previous pull request should be reverted.
version 4.1.5 had this code
unsigned char dummy = 0;
int nbytes = ::send (w, (char*) &dummy, sizeof (dummy), 0);
wsa_assert (nbytes != SOCKET_ERROR);
zmq_asserwsa_assertt (nbytes == sizeof (dummy));
to me it seems that the problem from at least one of the the reported chrash (#2358) was this
zmq_assert (nbytes == sizeof (dummy));
and not the wsa_assert
the other bugs mentioned in #2360 , at least one, are from a different platform (ubuntu, not windows)
since this code is not the same on both, I think the problem is somewhere else.
#2339 is on Ubuntu but with wine. The assertion failed is zmq_assert. But in this problem (at least mine for sure), nbytes is -1, not another value less than sizeof(dummy). The statement |
Thanks for the info about wine @nexcvon , this maks at least this clear. All 3 reported indicate that send had a problem. Is ignoring the problem the right solution? and finally, what does the current code solve?
it does not solve #2339
I really suggest to revert the code to the state as it has been before #2361 ,
Please. Thanks! |
I can reproduce this problem using pyzmq 16.0.2 (with zmq 4.1.6). The error code is EWOULDBLOCK (I was wrong about it may be 0), so wsa_assert doesn't fail (see https://github.com/zeromq/libzmq/blob/master/src/err.cpp#L95 and https://github.com/zeromq/libzmq/blob/master/src/err.cpp#L119). I don't known why EWOULDBLOCK is returned, and how it should be handled correctly. Retrying seems to be better than crashing for now, before someone reports a infinite loop. I am not familiar with sockets. I hope @a4z you can help improve this. Following is how I reproduce this problem using Python. I will try to get a c++ version working. Modified zmq::signaler_t::send:
Python code:
Output:
|
I am at work with less time unfortunately today, and no windows around me @nexcvon , could you please test if this solve it?
ps: looking at the code, the Linux version could also need some adoption, |
This is the signaler, so remember that on Linux it's not a TCP socket like on Windows. I honestly couldn't think of any reason why it should succeed to send, but less than 8 bytes. But if you want to send a PR to check for that special case please feel free to do so, as long as the overall functionality when there's an actual error is not changed. |
yes, as unlikely that this ever happen as that 'll touch the linux code now ;-) |
update on #2358 - I changed our code ( we use 4.1.5) only for signaler_t::send() to the code in this commit, and we do not see the abort anymore. Thanks. |
Solution: restore the wsa_assert statement which was previously removed.