Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MQTTAsync_terminate() to re-check the number of handles after re-acquisition of mqttasync_mutex #1124

Merged
merged 1 commit into from
Sep 30, 2021

Conversation

roscaviktor
Copy link
Contributor

@roscaviktor roscaviktor commented Jul 8, 2021

Description
Analyzing a few dumps revealed that the crash happens in the PAHO library in the background connection read thread, in function Socket_continueWrites(), which receives its socket argument pointing to de-allocated memory. Further investigation revealed that the cause for that is that the global variables, including variable mod_s in Socket.c, are cleaned up while the background threads are still alive.

Investigating further, the following race condition was found, leading to the possibility of having background threads running with global variables de-initialized:

  1. MQTTAsync_destroy() , after freeing resources allocated to the given MQTTAsync handle, it checks if this was the last one, in which case it also calls MQTTAsync_terminate() to terminate the background threads and uninitialize the global variables

  2. MQTTAsync_terminate() calls MQTTAsync_stop() which asks the background threads to terminate, by setting MQTTAsync_tostop to 1 (=true), and then waits for the background threads to actually terminate.

  3. During that wait, mqttasync_mutex is released (and re-aquired afterwards), which opens the possibility that an MQTTAsync_create() and even MQTTAsync_connect() are called in the meantime

  4. MQTTAsync_connect() resets MQTTAsync_tostop to 0 (false) and restarts the background threads if already terminated.

  5. Even in that case events at 3-4 happen, MQTTAsync_terminate() continues by destroying the global variables, effectively corrupting the global data used by the background threads (and also by future API calls).

The main part of the fix is, in MQTTAsync_terminate(), to re-check the number of handles after re-aquisition of mqttasync_mutex, and refrain from deleting the global variables if the number of handles is no longer zero.

Call stack
#0 0x00007fffdfb20d3b in Socket_continueWrites (pwset=0x7fffbfffebc0, sock=0x7fffbfffeba4)
at /builds/di/stork/raven/Raven/src/tools/externals/paho.mqtt.c/src/Socket.c:1001
#1 0x00007fffdfb1e4c8 in Socket_getReadySocket (more_work=0, tp=0x7fffbfffece0,
mutex=0x7fffdfd4d3a0 <socket_mutex_store>, rc=0x7fffbfffeca8)
at /builds/di/stork/raven/Raven/src/tools/externals/paho.mqtt.c/src/Socket.c:287
#2 0x00007fffdfb0ca72 in MQTTAsync_cycle (sock=0x7fffbfffedbc, timeout=1000, rc=0x7fffbfffedb8)
at /builds/di/stork/raven/Raven/src/tools/externals/paho.mqtt.c/src/MQTTAsyncUtils.c:2854
#3 0x00007fffdfb08589 in MQTTAsync_receiveThread (n=0x7fffe406eea8)
at /builds/di/stork/raven/Raven/src/tools/externals/paho.mqtt.c/src/MQTTAsyncUtils.c:1955
#4 0x00007ffff3ce56db in start_thread (arg=0x7fffbffff700) at pthread_create.c:463
#5 0x00007ffff2ecb71f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

To Reproduce
The crash is sporadically reproduced by running automated tests that are stressing MQTT test connection functionality in our application.

Environment
Docker Image: Linux 5.8.0-53-generic #60~20.04.1-Ubuntu SMP Thu May 6 09:52:46 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Signed-off-by: Victor Rosca rosca.viktor@gmail.com

Thank you for your interest in this project managed by the Eclipse Foundation.

The guidelines for contributions can be found in the CONTRIBUTING.md file.

At a minimum, you must sign the Eclipse ECA, and sign off each commit.

To complete and submit a ECA, log into the Eclipse projects forge
You will need to create an account with the Eclipse Foundation if you have not already done so.
Be sure to use the same email address when you register for the account that you intend to use when you commit to Git.
Go to https://accounts.eclipse.org/user/eca to sign the Eclipse ECA.

…-acquisition of mqttasync_mutex, and refrain from deleting the global variables if the number of handles is no longer zero.

Signed-off-by: Victor Rosca <rosca.viktor@gmail.com>
@icraggs icraggs added this to the 1.4.0 milestone Sep 30, 2021
@icraggs icraggs self-assigned this Sep 30, 2021
@icraggs
Copy link
Contributor

icraggs commented Sep 30, 2021

Thank you for the comprehensive description and fix.

@icraggs icraggs merged commit 286d65a into eclipse:develop Sep 30, 2021
@roscaviktor roscaviktor deleted the async-terminate-crash branch September 30, 2021 18:22
@icraggs icraggs modified the milestones: 1.4.0, 1.3.10 Jan 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants