Replies: 6 comments 15 replies
-
I managed to get my firmware working again by using This has been stable since the last post. (3 months) Today i changed to the latest espressif arduino sdk for platform io and the random reboots returned.
in the mean time i struggled with some random exception 28 after building the firmware. Anybody have a clue what the root cause could be? |
Beta Was this translation helpful? Give feedback.
-
This may be reaching a bit or not. I suspect there may be a problem with running out of memory. The random Exception 28 hints at that.
After a superficial look with objdump, The There is a build option that will allow you to monitor the heap for a low-water mark. See the example give in https://arduino-esp8266.readthedocs.io/en/latest/faq/a06-global-build-options.html#how-to-specify-global-build-defines-and-options . With so many device - in the event you do not have a convenient way to erase WiFi setting when changing SDKs, I'll leave this pointer Edit: Expanded Exception 28 description |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
My first thoughts:
|
Beta Was this translation helpful? Give feedback.
-
Is it possible to generate null_function? |
Beta Was this translation helpful? Give feedback.
-
I am not sure I understand the question. An ISR function could appear to be NULL if it is not defined with Weak functions that are defined by prototype, but never created can pass the link phase and crash when called.
|
Beta Was this translation helpful? Give feedback.
-
i have about 70-80 esp8266 nodes around in my house.
They all run the same firmware. (made by me) and controls almost everything in my house.
Recently most of them have started to reboot with Fatal exception:4 flag:3.
Sometimes they stay online for 2 days other times 2 minutes.
Longest uptime now is 5 days due to a power outage on the supply to my house.
Previously i have had 100days++ uptime on them and i have only rebooted them to do a firmware update.
The software uses webserver, tcp client for remote console, mqtt, ntp time syncing, serial output (if pin is not used by something else) ota update+++
I have implemented a simple stack save function so i can read the stack dump for remote devices.
The stack trace on all of them points to functions outside my code.
I'll include 2 of the decoded stack traces here in attachments.
but this it what it looks like
garage2_gate-decoded.txt
vinterhage_heatpump-decoded.txt
To me it looks like there is a bug somewhere in lwip2-src/src/core/tcp.c
This is a file i'm not able to debug.
I have tested to disable functionality one by one but i can not get rid of the bug.
I have tried to reduce mqtt publish rate. Disable ntp sync. remove remote tcp console code.
Tried different access points from different vendors (ASUS and Unifi)
They appear to reboot even if there is nothing special happening to them.
I can stress test all the interfaces (HTTP, MQTT, TCP Console and serial) without them rebooting.
Then randomly they reboot. and point to tcp.c pm_send_nullfunc.
I almost consistently have 4 out of 70 nodes with less than 1 hour uptime
i can not find anything in common with the ones that reboot most often.
i have done a erase_flash on some of the nodes but it dit not help.
Can anyone think of what might cause this?
Beta Was this translation helpful? Give feedback.
All reactions