-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: msg_t ownership inconsistent, with some leaks #2481
Comments
This method appears to leak a int zmq::sub_t::xsetsockopt (int option_, const void *optval_,
size_t optvallen_)
{
if (option_ != ZMQ_SUBSCRIBE && option_ != ZMQ_UNSUBSCRIBE) {
errno = EINVAL;
return -1;
}
// Create the subscription message.
msg_t msg;
int rc = msg.init_size (optvallen_ + 1);
errno_assert (rc == 0);
unsigned char *data = (unsigned char*) msg.data ();
if (option_ == ZMQ_SUBSCRIBE)
*data = 1;
else
if (option_ == ZMQ_UNSUBSCRIBE)
*data = 0;
// We explicitly allow a NULL subscription with size zero
if (optvallen_) {
assert (optval_);
memcpy (data + 1, optval_, optvallen_);
}
// Pass it further on in the stack.
int err = 0;
rc = xsub_t::xsend (&msg);
if (rc != 0)
err = errno;
int rc2 = msg.close ();
errno_assert (rc2 == 0);
if (rc != 0)
errno = err;
return rc;
} |
This method ignores the write failure (without commentary). If the write fails the message presumably leaks. void zmq::xpub_t::xattach_pipe (pipe_t *pipe_, bool subscribe_to_all_)
{
zmq_assert (pipe_);
dist.attach (pipe_);
// If subscribe_to_all_ is specified, the caller would like to subscribe
// to all data on this pipe, implicitly.
if (subscribe_to_all_)
subscriptions.add (NULL, 0, pipe_);
// if welcome message exist
if (welcome_msg.size() > 0)
{
msg_t copy;
copy.init();
copy.copy(welcome_msg);
pipe_->write(©);
pipe_->flush();
}
// The pipe is active when attached. Let's read the subscriptions from
// it, if any.
xread_activated (pipe_);
} |
Assuming that not closing a received delimiter message is not a leak, the above three methods are the only ones out of the 52 |
Did you experience a leak or those are assumptions?
Any in some of your examples the msg was empty inside, so no closing it
did not really leak anything. (like with the delimiter) , msg is always in
the stack, never dynamic allocated, only the bytes inside on the heap.
So on empty messages should be closed but not must. Anyway I will take a
more look deep look later today.
…On Mar 30, 2017 00:24, "Eric Voskuil" ***@***.***> wrote:
Assuming that not closing a received delimiter message is not a leak, the
above three methods are the only ones out of the 52 msg_t constructions
that I could see causing a leak. Most of calls that utilize a message are
asserted for success, crashing the process on failure. I'm not familiar
enough with the code base to say whether those are valid assumptions,
although in the case of ZAP message sending they were not.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2481 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AClv9i7gViP0JxBicxVVpaodcYEsmW3cks5rqswMgaJpZM4MtiVj>
.
|
If this allocates (or assumes ownership of) memory: rc = control_->recv (&msg, 0); then there is a real leak. This is not an assumption but a conclusion. This is public API: int zmq_proxy (void *frontend_, void *backend_, void *capture_)
{
if (!frontend_ || !backend_) {
errno = EFAULT;
return -1;
}
return zmq::proxy (
(zmq::socket_base_t*) frontend_,
(zmq::socket_base_t*) backend_,
(zmq::socket_base_t*) capture_);
} The I speculated above that some messages may not allocate and therefore may not result in actual leaks, but this is horribly fragile. The caller cannot be expected to know whether the message payload is stack or heap allocated. |
This doesn't leak memory, because msg is function local variable, the next time you call receive it will close the current msg and will create a new one. Receive always close the msg and create new one. Send always return empty initialized msg. However, with that said, zmq_proxy does leak one msg, because when the method exit we don't close the msg. This is a bug |
I look at your examples, the code is indeed complicated, but there is no leak.
Copied here, notice the * when calling write: Initialized here:
https://github.com/zeromq/libzmq/blob/master/src/fq.cpp#L128
https://github.com/zeromq/libzmq/blob/master/src/lb.cpp#L146
However, you usually won't see msg.close all the time, because of the pattern that receive and send close and initialize you usually see the msg being initialized once and closed once. So in a function it will be initialize at the beginning of the function and closed in the end and in a class initialized in the ctor and closed in the dtor. |
When a receive is successful a message payload may be allocated. If the message is not subsequently closed, it is a leak. You have not considered this case, which is the issue I've shown above in proxy (at least). |
The msg in the proxy example does get used again, so the next call to receive or send will close it. But, as said, proxy does (maybe) leak one msg for the entire process of the proxy, when exit proxy it doesn't close the msg, so last msg might be leaked, this should be fixed. However chances are that the msg is empty, anyway that is a bug and the msg should be closed when the proxy exit. |
There is no way to predict whether the message will be empty. The caller controls the socket type.
One message leak is a leak. It is not a maybe.
The proxy lifetime is not the same as the process lifetime.
This is irrelevant, the message is not always closed after a successful read. |
The msg in proxy get initialized and declared at the beginning of the method https://github.com/zeromq/libzmq/blob/master/src/proxy.cpp#L115 We only need to close the msg once, when exiting the method (which we don't). Taking your example to close the msg after the call to _control.Recv we would need to immediately init it anyway, because send and receive must work with initialized msg. So it would look like this: rc = control_->recv (&msg, 0);
control_->close();
control_->init(); because of this both are not needed. |
But it will get closed in the next call to send or recv |
Not in the cases where the method exits for another failure, which is the problem I raised (4 possible locations). |
agreed. following is a fix: |
For example, the second return leaks the message content allocation. rc = control_->recv (&msg, 0);
if (unlikely (rc < 0))
return -1;
moresz = sizeof more;
rc = control_->getsockopt (ZMQ_RCVMORE, &more, &moresz);
if (unlikely (rc < 0) || more)
return -1; |
yes, of course |
I'm less concerned with how to patch each instance, which is pretty straightforward once isolated, and more concerned with the fragility of the pattern employed. This is C++, we can use destructors. |
actually we cannot... msg is a structure, it is being allocated by every binding with another languages, which doesn't support destructors. So we must depend on the close method. I agree, it is complicated and delicate. |
another way to put it, because zeromq expose C API, msg is not really C++, it is C, and C doesn't support destructors. |
The vast majority of these message allocations never see the public API. |
Not true, every time you are sending a message you are creating the msg structure, take a look at all the binding, CZMQ for example: struct _zframe_t {
uint32_t tag; // Object tag for runtime detection
zmq_msg_t zmsg; // zmq_msg_t blob for frame
int more; // More flag, from last read
uint32_t routing_id; // Routing ID back to sender, if any
}; https://github.com/zeromq/czmq/blob/master/src/zframe.c#L37 it being declared on the stack, zmq_msg_t actually is not even a typedef for class msg_t it is char array at the same size of msg_t (don't ask me why...), so not even constructor. The correct way to look at msg is as C structure and not even C++. Which is why it doesn't use any ctor or dtors. |
I don't follow this response. It doesn't seem related to my observation. |
we cannot use ctor and dtor, as msg_t is exposed by a C API which doesn't support those. other binding like C, python, java allocate msg_t themself, which mean no constructor and destructor is being used, they cannot call C++ code, only C. This is why msg is using close and init. |
Yes, I got that part. My point is that very few uses of the I realize this is a stylistic issue within zmq, and I'm ok with it. But one should be aware of the risks of C style coding. The responsibilities and scope for error handling get spread out broadly. Just a casual walk through the code turned up a number of such issues for me today. |
The problem is, that all over libzmq we only have a reference to msg, almost no place allocate it. Binding are the one that allocate, so dtor and ctor won't save us from bugs. In the proxy case, yes it would help, maybe in some other cases, but they are not in the critical path of zeromq. Also the mechanism of when to call close and when to call init is complicated, not only when out of scope. I'm not sure it is possible to design such a class. But maybe. |
This is really not the case. There are 43 calls to Loss of scope is where object oriented approaches are most powerful, since function scope becomes irrelevant to the lifetime of a resource (as class scope takes over the resource). Only in the case of crossing the C API boundary is there is distinction. I realize this approach infects the entire library and would be hard to change. |
In the core of the library, msg_t is almost always by reference, the cases where msg_t is not be reference is very few and on those you actually don't want to call close (proxy is not core of the library, I'm talking about the send& recv flow). Almost all other places it is by reference. Do you want to covert those to stack? all of them? part of them? Also take a look at the following: https://github.com/zeromq/libzmq/blob/master/src/fq.cpp#L88 The msg is being closed at the begining of the scope and initialize in the end, you cannot use class here, you must use reference. I'm not sure where do you want to have stack msg (which will invoke the dtor), at what point? In the core path, there is none. For the none core path, like proxy, we can have some wrapper. |
This is not a question of reference vs. pointer or stack vs. heap allocation, and there is nothing about a class that prevents reassignment of resources. |
This is getting a little far off course for me. I'm not advocating for rewriting the message object. I was merely pointing out that its use is fragile as a matter of design. I spent the day today scanning the code for problems that I anticipated would arise from such design, and found some. It's very hard to tell if there are or are not more. So it's a caution. Also as I remarked, documentation regarding resource responsibility could be more clear. |
I agree that this is fragile. |
I will issue a PR to patch up whatever actual issues exist above. When I get time to spend on some major zeromq work it'll probably be the UTF-8 Everywhere implementation that I promised @hintjens. |
Specific issues above resolved by #2486 |
@somdoron I did a little checking into this question of the lack of AFAICT there is nothing that prevents this technique from working in the presence of a destructor on typedef struct zmq_msg_t {
#if defined (__GNUC__) || defined ( __INTEL_COMPILER) || \
(defined (__SUNPRO_C) && __SUNPRO_C >= 0x590) || \
(defined (__SUNPRO_CC) && __SUNPRO_CC >= 0x590)
unsigned char _ [64] __attribute__ ((aligned (sizeof (void *))));
#elif defined (_MSC_VER) && (defined (_M_X64) || defined (_M_ARM64))
__declspec (align (8)) unsigned char _ [64];
#elif defined (_MSC_VER) && (defined (_M_IX86) || defined (_M_ARM_ARMV7VE))
__declspec (align (4)) unsigned char _ [64];
#else
unsigned char _ [64];
#endif
} zmq_msg_t; I'm not in a rush to do this, but IMO it seems entirely reasonable and would not only make the code safer, it would make it much smaller and more readable. |
Please please please be extremely careful with that... last time it was changed it was weeks of work to get it right without breaking ABI |
Of course. Any change to the |
Given that there is no
msg_t
destructor there are only two options for resource release, the owner (creator) of the message must free based on result code or the code that generates and/or propagates the error must handle it.A casual look at the code left me with the impression the the intent is the owner cleans up in the case of a send failure (but the sender does in the case of a success), and the reader cleans up in the case of a read failure (but the owner does in the case of success). It will take a little time to pin it down, since there are 52 message constructions, passed through various deep call stacks. But the first path I traced let to this:
Notice that this section returns false with a message that has been read:
Popping back up the stack we are here:
Notice that this read failure causes the return of a
-1
(fail) code. Popping up again we are here:So in this case a read call failure requires the owner to close the message that was reported as not read. Yet in the same code path a failure of the read in this line:
does not require that the message be closed by the caller. This then implies that it must always be safe to close an unclosed or a closed message, though this is unclear in zmq_msg_close documentation.
Documentation also states that is not required after any successful
zmq_msg_send
but is silent about whether it must be called after a failedzmq_msg_read
(it appears that it must be), failed send or successful read.In another section the behavior appears to leak on ceratin read failure:
Notice that, unlike in the previous example, the same
pipe->read (msg_)
failure in this case does not lead to the caller performing a cleanup. Now maybe certain types of read failures don't require the caller to clean up and others do, so it's ok (i.e. not a leak) for the reader to successfully read, not close the message, and yet return a failure (such as for a delimiter message). But that's not obvious (or commented) in the code.IMO this C-style resource management (as opposed to RAII) makes the code fragile. By that I mean it's very hard to see errors as resource management is a side concern of a large amount of code. Are there tests designed to detect leaks in failure conditions, or only in case of success?
I did go through each code path that allocates
msg_t
. This one appeared to be a potential problem.After a message is read successfully there are four conditions in which it will not be closed upon return.
The text was updated successfully, but these errors were encountered: