-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORRUPT HEAP with 0.14.1-b2, ESP32, maybe related to MQTT #3637
Comments
Please use debug build and exception decoder to help out with crash diagnosis. |
Is there a debug build binary, or do i have to build it myself? i looked at exception decoder, seems like it's an arduino plugin? but i also looked at building wled, and now it's only supported on platformIO? so how can I decode a platformio binary with arduino IDE... i'm confused or i need some pointers |
You'll need to compile yourself to get a meaningful output from exception decoder. [env:debug]
extends = env:esp32dev
monitor_filters = esp32_exception_decoder
build_flags = ${common.build_flags_esp32}
-D WLED_DEBUG
... any other build flags Then just use PIO's monitor tool. |
Yes, you need VSCode+platformio for building + installing wled from source code. The KB has some guidance for getting started: |
Thank you, it's built with debug enabled and running, we should soon find out whether it produces any useful output. I do see some extra logging, so that's a good start... |
I had to disable the debug output, as i suspect it delayed execution a little and prevented my bug to reproduce.... ran 24 hours, lost wifi a bunch of times (another issue?) but didn't see a crash. Until I removed WLED_DEBUG. Then, only 2 hours in, got this nice stacktrace Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x40157372 PS : 0x00060930 A0 : 0x8014e915 A1 : 0x3ffb5da0
A2 : 0x3ffdae70 A3 : 0x00450008 A4 : 0x0000004e A5 : 0x00000000
A6 : 0x3ffb52a8 A7 : 0x00000000 A8 : 0x00702903 A9 : 0x00702929
A10 : 0x00000000 A11 : 0x0000030a A12 : 0x0070261f A13 : 0x00000b38
A14 : 0x00060920 A15 : 0x00000000 SAR : 0x00000019 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00450018 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0xffffffff
ELF file SHA256: 0000000000000000
Backtrace: 0x4015736f:0x3ffb5da0 0x4014e912:0x3ffb5dd0 0x4011e33a:0x3ffb5df0 0x4014974d:0x3ffb5e10 0x4008b89e:0x3ffb5e40
#0 0x4015736f:0x3ffb5da0 in tcp_output at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/tcp_out.c:1025
#1 0x4014e912:0x3ffb5dd0 in tcp_recved at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/core/tcp.c:1765
#2 0x4011e33a:0x3ffb5df0 in _tcp_recved_api(tcpip_api_call_data*) at .pio\libdeps\debug\AsyncTCP\src/AsyncTCP.cpp:1153
#3 0x4014974d:0x3ffb5e10 in tcpip_thread at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/api/tcpip.c:483
#4 0x4008b89e:0x3ffb5e40 in vPortTaskWrapper at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)
Rebooting... |
Not in WLED code. Check your MQTT broker. There was an issue with old Windows implementation of Mosquitto broker in the past. |
Here is another one, still from the same build CORRUPT HEAP: Bad head at 0x3ffde190. Expected 0xabba1234 got 0x3ffde608
abort() was called at PC 0x4008eb39 on core 0
ELF file SHA256: 0000000000000000
Backtrace: 0x40089af8:0x3ffb5d10 0x40089e55:0x3ffb5d30 0x4008eb39:0x3ffb5d50 0x4008543a:0x3ffb5d70 0x40085805:0x3ffb5d90 0x4000bec7:0x3ffb5db0 0x4016def2:0x3ffb5dd0 0x4016df29:0x3ffb5df0 0x4014974d:0x3ffb5e10 0x4008b89e:0x3ffb5e40
#0 0x40089af8:0x3ffb5d10 in invoke_abort at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:648
#1 0x40089e55:0x3ffb5d30 in abort at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/esp32/panic.c:648
#2 0x4008eb39:0x3ffb5d50 in multi_heap_free at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:321
#3 0x4008543a:0x3ffb5d70 in heap_caps_free at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c:232
#4 0x40085805:0x3ffb5d90 in _free_r at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/newlib/syscalls.c:42
#5 0x4000bec7:0x3ffb5db0 in ?? ??:0
#6 0x4016def2:0x3ffb5dd0 in _udp_pcb_deinit at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/mdns/mdns_networking.c:202
#7 0x4016df29:0x3ffb5df0 in _mdns_pcb_deinit_api at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/mdns/mdns_networking.c:267
#8 0x4014974d:0x3ffb5e10 in tcpip_thread at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/lwip/lwip/src/api/tcpip.c:483
#9 0x4008b89e:0x3ffb5e40 in vPortTaskWrapper at /home/cschwinne/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c:355 (discriminator 1)
Rebooting... What confuses me is that WLED_0.13.3_ESP32 doesn't crash and doesn't disconnect from the wifi. I have 8 different WLED 0.13.3 running on ESP32 and they all have an uptime of 14 days (last power outage). All my 3-4 instances of WLED 0.14 are rebooting, drop out of the network, glitch on the output after a couple of days. They all use the same WiFi access point and the same broker (mosquitto 2.0.15 on linux (docker)) |
I've just updated mosquitto to the latest version available on dockerhub, 2.0.18, we'll see if WLED still crashes |
Very likely related to #3641 |
Since i have updated mosquitto, i haven't seen any stack trace. I however can't test for long, as after a few hours, i always loose WiFi connectivity with 0.14*; I will check later if someone reported a bug on that. Once WiFi is lost, obviously there is no chance for the network packets to be malformed or misread, since there are none reaching the IP stack. However, the corrupted heap crash was occurring much earlier than when the WiFi dropped, so the stability issues are probably indeed related to the broker, and/or to mdns (which i saw mentioned in one of the stack traces) |
If not yet, please use 0.14.1-b3 EDIT: WiFi issues are not WLED related but rather your network set-up/hardware. |
I get your point of view, but like i said earlier, i have 8x wled instances on 0.13.x running for very long without any wifi connectivity issues. My 4x ESP32's with 0.14.1-b2 all drop consistently from the wifi. The ESP32's are sourced from 3 different vendors (some are standard esp32 dev board, 3 are quinled dig-uno/quad, 3 are my own PCB with ESP32 assembled by JLCPCB). Only those with 0.14 lose connectivity and can't recover until I powercycle them. I know on my network i have a very short DHCP lease time (15 minutes), i had forgotten it from some older network manipulation i was doing, but all the ESP32 & ESP8266 running WLED 0.13.x + shelly + amazon echo + sonos + ... that are on this network are happily staying connected since forever... except for all occurences of WLED 0.14 which consistently lose wifi after a few hours. I don't want to change the DHCP lease time until I get to the bottom of this issue. Access points are Ubiquiti Unifi 6 something, can't remember the exact model, but pretty much top of the line for 2 years ago, and indeed my wifi coverage is pretty solid since i installed those. Router/DHCP server is Netgate, also pretty much top of the line. Both access points and router were recently rebooted and seem snappy and happy. I know you get a lot of users with weird setups coming to nag here, but please don't dismiss so fast, because from my analysis, all clues point to the version of WLED running on the ESP32. |
Hi, the two crashes both happen deep inside the TCP and UDP core, without any WLED source code in the trace. The second crash (with To preserve memory, it usually helps to disable some "bells and whistles" - like
👉 Did you try with the latest beta 0.14.1-b3? We have fixed some use-after-free problems recently, so the latest beta might behave better. As last resort, you could wipe your device completely with Finally, some wifi problems go away when using a newer espressif framework - buildenv esp32dev_V4_dio80. |
I did not dismiss you out of blue. So I will insist on WiFi or other network traffic issues which WLED cannot solve. For clarification: network parts have not been modified since 0.12. The only addition was a signal strength fix for newer ESP32 models like C3,S2 & S3 which is a compile time option. |
FYI having "Fast roaming" or BSS Transition enabled is known to cause issues with non-compilant hardware. WLED does not support those protocols. |
A newer bootloader may be needed as it initialises hardware prior to firmware. If your devices have old bootloader (pre 0.13) then they may need bootloader update. |
As an update, I have used https://wled-install.github.io/ and flashed the version "Standard version 0.14.1 V4 (ESP IDF 4.4.3 based, experimental, should resolve reboot issues)" and so far it seems stable. I was losing connectivity or seeing reboots much much faster, and so far it's running 24h and still online, responsive and snappy. |
6 days uptime going strong, i think this is it |
Hey! This issue has been open for quite some time without any new comments now. It will be closed automatically in a week if no further activity occurs. |
What happened?
Hello,
All my 5 ESP32's running WLED_0.14.1-b2_ESP32.bin keep rebooting randomly, sometimes after only a few hours. They're all connected to my MQTT broker, with moderate traffic.
I have another ESP32 with WLED_0.14.1-b2_ESP32_audioreactive.bin, on that one MQTT isn't enabled, and since the upgrade from 0.14 to 0.14.1-b2, it's stable so far (3 days uptime).
I have managed to capture a stacktrace, but i don't know how to decode it.
This stacktrace was generated from WLED_0.14.1-b2_ESP32.bin, at least the binary from install.wled.me
To Reproduce Bug
Expected Behavior
No crash
Install Method
Binary from WLED.me
What version of WLED?
WLED 0.14.1-b2 (build 2312290)
Which microcontroller/board are you seeing the problem on?
ESP32
Relevant log/trace output
CORRUPT HEAP: Bad head at 0x3ffd7838. Expected 0xabba1234 got 0x3ffd7864 abort() was called at PC 0x4008eb39 on core 0 ELF file SHA256: 0000000000000000 Backtrace: 0x40089af8:0x3ffb5d10 0x40089e55:0x3ffb5d30 0x4008eb39:0x3ffb5d50 0x4008543a:0x3ffb5d70 0x40085805:0x3ffb5d90 0x4000bec7:0x3ffb5db0 0x4016def2:0x3ffb5dd0 0x4016df29:0x3ffb5df0 0x4014974d:0x3ffb5e10 0x4008b89e:0x3ffb5e40 Rebooting... ets Jul 29 2019 12:21:46 rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DOUT, clock div:2 load:0x3fff0018,len:4 load:0x3fff001c,len:1044 load:0x40078000,len:10124 load:0x40080400,len:5828 entry 0x400806a8 Ada
Anything else?
Thank you for your help!
Code of Conduct
The text was updated successfully, but these errors were encountered: