-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AsyncClient: Possible race in MQTTAsync_assignMsgId, results in same msgid in used multiple messages #867
Comments
…ultiple times, if one client uses multiple threads Fixes eclipse-paho#867 Signed-off-by: Michael Trensch <mtrensch@gmail.com>
I see the problem. Thanks very much for the PR. Were you able to verify that the change fixed the problem? |
After applying the fix my problem was gone and a long-term-test, with paho mqtt c 1.3.2 with only applied pull-request, was running for 44 hours now. Paho's receiver thread is still at 3% CPU usage and my whole program is stuck at 22% CPU all the time. Memory usage stays at 0.2% for the 44h. So there is no memory / cpu usage rising any more and the problem is gone. Before applying the pull-request my program died after after 3-4 hours, showing CPU and memory usage rising during this period. When I am back at work, I might try to deliver a program / test to show the problem and verify if it's gone, if you like. But I am not sure if a reliable test can be done, as it happens occasionally and you cannot access the affected functions directly. |
You don't need to do any more, unless you really want to. I can see that it's a problem, I missed the initializations when the lock was added. I just wanted to get an idea of whether there were any other problems lurking. Thanks again. |
You are welcome. It was not easy to detect, as I did search in the wrong location at first, but the good (protocol) tracing mechanism in the library helped a lot. |
Fetch clients messageid in locked state to prevent using same msgid for different messages (Issue #867)
Describe the bug
I have a project that publishes about 100 QoS1/2 messages at once using multiple threads every 250ms. After some time the memory and cpu usage of the receiver thread rises.
Looking at the internal list "m->responses", i see that old msgid's are still inserted, while a PUBCOMP has already been received.
It seems that the same msgid is used for multiple messages
Looking at the function "MQTTAsync_assignMsgId" shows a possible race when not being called from an internal thread. The previous msgid is fetched before the mutex has been locked and assigned inside locked context. If Thread B was blocked by the mutex and Thread A stores a new msgid, "m->c->msgID" might be overwritten by the old msgid. As the message is not in the command list yet, the check against messages in the command queue does not help
To Reproduce
I currently don't have a setup, but I think the description above is quite explaining the problem.
I added to print if a message from the response queue does not match the incoming PUBCOMP. (see https://github.com/mtrensch/paho.mqtt.c/blob/3148fe2d5f4b87e16266dfe559c0764e16ca0546/src/MQTTAsync.c#L4126)
This produced the following protocol log (I stripped unnecessary debugging output)
Expected behavior
I would expect that all active messages have a unique msgid.
** Environment (please complete the following information):**
Additional context
This affects multi-threaded environments when using a single client with multiple threads. In QoS0 it does not cause any harm, but at higher QoS levels the command queue size will increase and never be cleaned up
Did sign the ECLA now and did a pull-request #868
The text was updated successfully, but these errors were encountered: