-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seg Fault in cmdMessageIDCompare #880
Comments
There is a fix in the develop branch for #867 which may affect this. |
I'll give it a try on Monday and see if it works (Bank Holiday today). Thanks. |
I applied this patch: Unfortunately, I still see the issue. I have attached a protocol trace. I can make other traces if necessary. |
Maybe I can offer an idea, where to check. I have not had this issue myself, but I like to find possible races ;)
For me this looks like the content of a list element (probably the command queue) has already been freed, but not unlinked from list. Looking at the function MQTTAsync_assignMsgId it might be possible that either command queue or response queue might be affected. I think 1.3.2 did not include the outbound message queue yet: Probably also locking the "mqttcommand_mutex" which seems to guard the command list could help. My assumption is that one thread is currently unlinking an item from the List and after it is being set to NULL it is either interrupted (probably here: https://github.com/eclipse/paho.mqtt.c/blob/146d210addda572870ee56accbce6cc7bb5f129b/src/LinkedList.c#L227) or another thread reaches this element while checking for active commands. Probably another race like in the referenced issue #867. |
Well the simplest approaches are to a) @atomcraft-th supply me with a test program or b) @atomcraft-th to try with the command mutex locked. I just tried a build with that lock included, and that passed the regression tests. |
Thanks, I'll try that. FYI, when I studied the call stack a little, the 'a' parameter which is null comes from |
That is what I was mentioning above. The List entry is going to be removed. Its content is already set to NULL before being dequeued. When during this time another thread dereferences this element (what cmdMessageIDCompare is obviously doing) it segfaults as it accesses a NULL pointer. The main issue with the suggested fix (c7f6161) with also taking the command lock, is that the other lists (responses and outboundmsgs) are still unguarded. I don't know if this is a real problem but in theory it is. Maybe a generic per-list-lock inside List Operations would come in handy. |
Describe the bug
I am seeing a segmentation fault, it isn't easy to hit, although the more threads I have sending through mqtt the more likely it seems to be.
It always seems to occur when two threads are trying to send messages concurrently.
It is always in cmdMessageIDCompare, always with a=0x0 and always called by ListFindItem.
Callstack for seg fault:
#0 0x001fd980 in cmdMessageIDCompare (a=0x0, b=0x7effef70) at /home/pi/dev/build/libraries/mqtt/src_c/src/MQTTAsync.c:3442
#1 0x00211b88 in ListFindItem (aList=0x317dc0, content=0x7effef70, callback=0x1fd960 ) at /home/pi/dev/build/libraries/mqtt/src_c/src/LinkedList.c:178
#2 0x001fdabc in MQTTAsync_assignMsgId (m=0x4019d0) at /home/pi/dev/build/libraries/mqtt/src_c/src/MQTTAsync.c:3470
#3 0x001fe8a0 in MQTTAsync_send (handle=0x4019d0, destinationName=0x471af0 "status/driver/06/32/status", payloadlen=4, payload=0x43c6ec, qos=1, retained=0, response=0x7efff008)
at /home/pi/dev/build/libraries/mqtt/src_c/src/MQTTAsync.c:3740
#4 0x001fecd4 in MQTTAsync_sendMessage (handle=0x4019d0, destinationName=0x471af0 "status/driver/06/32/status", message=0x415c14, response=0x7efff008) at /home/pi/dev/build/libraries/mqtt/src_c/src/MQTTAsync.c:3827
#5 0x001d525c in mqtt::async_client::publish (this=0x40169c, msg=std::shared_ptr (use count 2, weak count 0) = {...}) at /home/pi/dev/build/libraries/mqtt/src/src/async_client.cpp:565
#6 0x001d4e8c in mqtt::async_client::publish (this=0x40169c, topic=..., payload=0x7efff144, n=4, qos=1, retained=false) at /home/pi/dev/build/libraries/mqtt/src/src/async_client.cpp:539
#7 ...
Call stack for additional send thread
#0 __lll_lock_wait (futex=futex@entry=0x28e4b0 <mqttasync_mutex_store>, private=0) at lowlevellock.c:43
#1 0x76f84f44 in __GI___pthread_mutex_lock (mutex=0x28e4b0 <mqttasync_mutex_store>) at pthread_mutex_lock.c:80
#2 0x0020df58 in Thread_lock_mutex (mutex=0x28e4b0 <mqttasync_mutex_store>) at /home/pi/dev/build/libraries/mqtt/src_c/src/Thread.c:121
#3 0x001f5648 in MQTTAsync_lock_mutex (amutex=0x28e4b0 <mqttasync_mutex_store>) at /home/pi/dev/build/libraries/mqtt/src_c/src/MQTTAsync.c:541
#4 0x001fda34 in MQTTAsync_assignMsgId (m=0x4019d0) at /home/pi/dev/build/libraries/mqtt/src_c/src/MQTTAsync.c:3465
#5 0x001fe8a0 in MQTTAsync_send (handle=0x4019d0, destinationName=0x73fc55f0 "driver/13/01/measurements/current", payloadlen=1, payload=0x73f7867c, qos=1, retained=0, response=0x750fd8b0)
at /home/pi/dev/build/libraries/mqtt/src_c/src/MQTTAsync.c:3740
#6 0x001fecd4 in MQTTAsync_sendMessage (handle=0x4019d0, destinationName=0x73fc55f0 "driver/13/01/measurements/current", message=0x73f05b24, response=0x750fd8b0) at /home/pi/dev/build/libraries/mqtt/src_c/src/MQTTAsync.c:3827
#7 0x001d525c in mqtt::async_client::publish (this=0x40169c, msg=std::shared_ptr (use count 2, weak count 0) = {...}) at /home/pi/dev/build/libraries/mqtt/src/src/async_client.cpp:565
#8 0x001d4e8c in mqtt::async_client::publish (this=0x40169c, topic=..., payload=0x750fda3c, n=1, qos=1, retained=false) at /home/pi/dev/build/libraries/mqtt/src/src/async_client.cpp:539
#9 ...
To Reproduce
I'm struggling to reproduce with tracing enabled, I'll keep trying with different options.
Expected behavior
Not a seg fault
** Environment (please complete the following information):**
The text was updated successfully, but these errors were encountered: