Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in Socket_getReadySocket #696

Closed
j8phi opened this issue Aug 1, 2019 · 11 comments
Closed

Segmentation fault in Socket_getReadySocket #696

j8phi opened this issue Aug 1, 2019 · 11 comments
Milestone

Comments

@j8phi
Copy link

j8phi commented Aug 1, 2019

I'm getting a segmentation fault when it's trying to do select in this function. This is for an embedded application where I attempt to connect to the broker and if MQTTClient_connect returns an error than I wait 30 seconds and I'll call MQTTClient_disconnect, then MQTTClient_destroy, recreate the client and attempt to connect again.

I'm simulating a case where there isn't a good connection to the broker so it's continuously attempting to connect. Sometimes it may connect but it will frequently lose the connection and attempt to reconnect. I've attached the trace files and the gdb trace.

mqttlvl1.txt
mqttlvl3.txt

#0 0xfe373450 in ?? ()
(gdb) bt
#0 0xfe373450 in ?? ()
#1 0xfe373384 in ?? ()
#2 0x484ff5ac in Socket_getReadySocket (more_work=1218187716, tp=0x47fbeeb0, mutex=0x47fbef10) at /shared/build/test/LIBS/pahoMqttLocal/src/Socket.c:267
#3 0x00000000 in ?? ()
(gdb)
#0 0xfe373450 in ?? ()
#1 0xfe373384 in ?? ()
#2 0x484ff5ac in Socket_getReadySocket (more_work=1218187716, tp=0x47fbeeb0, mutex=0x47fbef10) at /shared/build/test/LIBS/pahoMqttLocal/src/Socket.c:267
#3 0x00000000 in ?? ()
(gdb) f 2
#2 0x484ff5ac in Socket_getReadySocket (more_work=1218187716, tp=0x47fbeeb0, mutex=0x47fbef10) at /shared/build/test/LIBS/pahoMqttLocal/src/Socket.c:267
267 rc = select(s.maxfdp1, &(s.rset), &pwset, NULL, &timeout);
(gdb) print s.maxfdp1
$1 = 0
(gdb) print s.rset
$2 = {fds_bits = {0, 0, 0, 0, 0, 0, 0, 0}}
(gdb) print pwset
$3 = {fds_bits = {1207692928, 0, 0, 0, 0, 0, 0, 0}}
(gdb) print timeout
$4 = {tv_sec = 2, tv_usec = 1218187716}

I also have other threads using sockets to send other types of data. I'm using libraries based on the latest changes in the develop branch where it's being prepped for release 1.3.1.

@aggresss
Copy link
Contributor

aggresss commented Aug 6, 2019

#689

@icraggs
Copy link
Contributor

icraggs commented Aug 6, 2019

From that gdb backtrace, it looks like you have multiple threads calling Socket_getReadySocket(). That's not intended to happen, and if true, is the source of the problem.

@j8phi
Copy link
Author

j8phi commented Aug 6, 2019

I have all the MQTT logic running in one thread in asynchronous mode so the only other thread would be the helper thread that gets created. Perhaps the pasting of the trace wasn't clear but I was just trying to show you the output of the commands I had typed in once in gdb.

@icraggs
Copy link
Contributor

icraggs commented Aug 6, 2019

  1. If you do have a small program which exhibits the problem, then that would be the best as we could add it to the test suite.

  2. When attempting to reconnect, you don't need to destroy and recreate the client object. In fact, it's best if you don't. I'm not saying that's the cause of the problem - just what I'd advise anyway.

@j8phi
Copy link
Author

j8phi commented Aug 6, 2019

I don't have a small program at the moment that I can put here.

I was destroying the client to mainly change the client ID before reconnect. Is there a way to do that without having to destroy?

@icraggs
Copy link
Contributor

icraggs commented Aug 8, 2019

No, you can't change the client id without recreating. This was to minimise accidental errors with persistence, where if the client id is changed, any persisted data is invalidated.

@j8phi
Copy link
Author

j8phi commented Aug 16, 2019

One other thing I noticed was that every time I call MQTTClient_connect after a failed connection attempt, new file descriptors are being created. I'm using websockets method to connect to broker. I eventually get to a point where there no more descriptors to use for the rest of the system.

Here's roughly what I'm doing:

while (MQTTCLIENT_SUCCESS != MQTTClient_connect(client, &conn_opts))
{
MQTTClient_disconnect(client, 30);
sleep(30);
}

Is there something that I'm missing that's supposed to close all the opened descriptors?

@j8phi
Copy link
Author

j8phi commented Aug 19, 2019

I went back to version 1.3.0 and don't see this issue with the same code above. I debugged and it looks everytime MQTTClient_connect is called a new socket is being created and just left open. Seems like we should be reusing previously opened socket until we timeout or there's an error.

@icraggs
Copy link
Contributor

icraggs commented Aug 20, 2019

A connect will create a new socket each time. There might be different connect options which need to be applied in any case so that's unlikely to change.

Out of interest, why do you need to change the client id on each connect? That's potentially going to leave more state around on the server.

@j8phi
Copy link
Author

j8phi commented Aug 20, 2019

What do you mean by different connect options?

I'm not changing the client id on every connect. I run a similar loop as I mentioned above for an hour and if it can't connect then I destroy the client and recreate it with a new ID and try to connect again. I was doing that mainly just to make sure there was nothing wrong on my end and if for some reason the IOT platform was blocking the connection because of the client ID. I took your advice about not destroying it so I'm just the same client ID now but that's when I noticed all the open descriptors.

@icraggs icraggs added this to the 1.3.2 milestone Mar 16, 2020
@icraggs
Copy link
Contributor

icraggs commented Mar 16, 2020

Potentially fixed in the develop branch.

@icraggs icraggs closed this as completed Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants