Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTTAsync doesn't reconnect when PINGRESP not received in keepalive interval. #1276

Closed
bowensmachinesense opened this issue Oct 13, 2022 · 3 comments
Milestone

Comments

@bowensmachinesense
Copy link

Describe the bug
when automaticReconnect is set it isn't reconnecting when PINGRESP isn't received. It reconnects fine if the server goes down and comes back up but in this instance it doesn't.

I found this while bringing network interfaces up and down.
using version 1.3.11

** Trace **
bin# ./paho_c_pub -t "test" -h xxx.xxx.xxx.xxx --trace protocol
Trace : 3, =========================================================
Trace : 3, Trace Output
Trace : 3, Product name: Eclipse Paho Asynchronous MQTT C Client Library
Trace : 3, Version: 1.3.11
Trace : 3, Build level: 2022-04-11T20:02:07Z
Trace : 3, OpenSSL version: OpenSSL 1.1.1n 15 Mar 2022
Trace : 3, OpenSSL flags: compiler: aarch64-openwrt-linux-musl-gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -Os -pipe -mcpu=cortex-a53 -fno-caller-saves -fno-plt -fhonour-copts -Wno-error=unused-but-set-variable -Wno-error=unused-result -Wformat -Werror=format-security -fstack-protector -D_FORTIFY_SOURCE=1 -Wl,-z,now -Wl,-z,relro -DPIC -fPIC -ffunction-sections -fdata-sections -znow -zrelro -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK16
Trace : 3, OpenSSL build timestamp: built on: Mon Apr 11 20:02:07 2022 UTC
Trace : 3, OpenSSL platform: platform: linux-aarch64-openwrt
Trace : 3, OpenSSL directory: OPENSSLDIR: "/etc/ssl"
Trace : 3, /proc/version: Linux version 5.10.110 (edit@edited) (aarch64-openwrt-linux-musl-gcc (OpenWrt GCC 11.2.0 r19433-8689e8ccba) 11.2.0, GNU ld (GNU Binutils) 2.37) #0 SMP Mon Apr 11 20:02:07 2022

Trace : 3, =========================================================
Trace : 4, 19700101 000000.000 Connecting to serverURI xxx.xxx.xxx.xxx:1883 with MQTT version 4
Trace : 4, 19700101 000000.000 3 paho-c-pub -> CONNECT version 4 clean: 1 (0)
Trace : 4, 19700101 000000.000 3 paho-c-pub <- CONNACK rc: 0
abc
Trace : 4, 19700101 000000.000 3 paho-c-pub -> PUBLISH qos: 0 retained: 0 rc: 0 payload len(4): abc

Trace : 4, 19700101 000000.000 3 paho-c-pub -> PINGREQ (0)
Trace : 4, 19700101 000000.000 3 paho-c-pub <- PINGRESP
Trace : 4, 19700101 000000.000 3 paho-c-pub -> PINGREQ (0)
Trace : 4, 19700101 000000.000 PINGRESP not received in keepalive interval for client paho-c-pub on socket 3, disconnecting
Trace : 4, 19700101 000000.000 3 paho-c-pub -> DISCONNECT (0)
To Reproduce
I'm using 2 interfaces, a wired (eth0) and wireless interface (wlan0). Wired interface has a higher metric set so when the wireless interface is up it will route though that interface.

Run paho_c_pub with wireless interface up..
(Functions properly at this time, you can see from trace published abc.
Bring wireless interface down. (Now only ethernet interface is active.)
You will notice that the PINGRESP didn't come (as expected) and then we disconnect.
Now it never tries to reconnect.

Expected behavior
Expect that after the ping response fails and disconnect occurs should at least see reconnect attempts. If I run the same command set and bring the mqtt server down I will see reconnect attempts, and when the server comes back up it the reconnect succeeds.

** Environment **
Linux Openwrt 5.10.110

I've done some higher level traces and searched through the code a bit and I see that it follows a different flow for the ping response fail vs a server going down.

Let me know if there is more information I can provide. I can get it to recreate every time I follow the above steps. I originally was using my code base when I came across this so I made sure I could recreate using the provided sample code.

For obvious reasons the IP address are edited.

Also note no other logs come with the PROTOCOL trace level after the DISCONNECT.

Thanks.

B...

@bowensmachinesense
Copy link
Author

Additional information.

I had time to build other versions and test, and the 1.3.9 and 1.3.10 doesn't experience this problem.

icraggs added a commit that referenced this issue Nov 27, 2022
@icraggs icraggs added this to the 1.3.12 milestone Nov 27, 2022
@icraggs
Copy link
Contributor

icraggs commented Nov 27, 2022

I've added a fix to the develop branch which seems to work for me.

@icraggs
Copy link
Contributor

icraggs commented Nov 30, 2022

Hmm. That didn't seem to be the proper fix. This one should be it.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants