Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Task watchdog" Reboot loop at startup with 0.8.140 and 0.8.141 / INT Pin status "unknown" for nRF24L01? #1733

Closed
1 task
juepi opened this issue Aug 20, 2024 · 28 comments
Assignees
Labels
bug Something isn't working

Comments

@juepi
Copy link

juepi commented Aug 20, 2024

Platform

ESP32

Assembly

I did the assembly by myself

nRF24L01+ Module

nRF24L01+ plus

Antenna

circuit board

Power Stabilization

Elko (~100uF)

Connection picture

  • I will attach/upload an image of my wiring

Version

0.8.140

Github Hash

f1f4481

Build & Flash Method

AhoyDTU Webinstaller

Setup

MQTT configured, Inverter interval 5 seconds, 2 inverters configured, one activated

Debug Serial Log output

No response

Error description

Really not sure if this is a problem with my infrastructure, but reporting it anyways in case it sounds familiar to someone:
Hardware: ESP32-S3, using image *_opendtufusion.bin

was running fine on 0.8.130 for several weeks. Decided to go for a update to the new release 0.8.140, however directly after the upgrade (AhoyDTU /update page) i've experienced the following issues:

  • WebUI landing page seems to lose system / NTP time every few seconds
  • the enabled inverter on the landing page jumps from "producing" to "not available every few seconds
  • in the Live view, the inverter data is permanently changing from "greyed out" to "producting"
  • MQTT communication did not seem to work (configured power limit changes were not updated in AhoyDTU)

I would assume the corrupted system time to be the root cause for all other problems.

I've also tried the current dev build 0.8.141, same issue. I've downgraded my ESP to the 0.8.130 devbuild again, seems to work fine now.

yours,
Juergen

@juepi juepi added the new new issue which need review by developer label Aug 20, 2024
@rmayergfx
Copy link

NTP, which server or ip is inserted? Your local Router?

@juepi
Copy link
Author

juepi commented Aug 20, 2024

didnt change the default "pool.ntp.org" setting.

@rmayergfx
Copy link

give a try, insert ip from local router if the device will deploy time services. Better also if ISP is down.

@juepi
Copy link
Author

juepi commented Aug 21, 2024

give a try, insert ip from local router if the device will deploy time services. Better also if ISP is down.

Did so, using IP address of my WiFi router. Works perfectly well with 0.8.130. After upgrading once again to 0.8.140, same issue again.

Web landing page keeps switching between
image

and
image

in irregular intervals (some seconds). Also uptime counter not working, even saw it count backwards once..

After some minutes, I've downgraded to 0.8.130 again. I saw the same behavior for a short time (maybe less than a minute) after booting the ESP with the old version, but it stopped and now ahoyDTU is running again without any issues as it seems.

yours,
Juergen

@rmayergfx
Copy link

Save your Settings and then remove inverter #1 completely and try again with 0.8.141.
be sure to use the right one for your board!

@juepi
Copy link
Author

juepi commented Aug 22, 2024

Save your Settings and then remove inverter #1 completely and try again with 0.8.141. be sure to use the right one for your board!

Ok, will give it a try when i'm home, however everything works fine with 0.8.130? 🤔

P.s.: will fire up inverter #1 and enable it first, then upgrade to 0.8.141. Or do you expect an issue if more than one inverter is used?

@juepi
Copy link
Author

juepi commented Aug 23, 2024

Update:

Starting after enabling Inverter #1 and rebooting with 0.8.130, everything is fine:
image

After upgrading to 0.8.141, same issues start again. NTP time is fetched
image

then after a few seconds, system time is lost again, which also breaks inverter communication:
image

What i found interesting: in the error case, the Uptime counter never got higher than 10 seconds. After reaching 9-10 seconds, system time was lost and the uptime counter reset to a lower value of 4-5 seconds.
I observed this buggy behavior once again for some minutes then downgraded to 0.8.130 again.

As i need both inverters to be working, i decided not to delete inverter #1 (if there's a plausible reason/suspicion to try to delete inv#1 please explain). I'm currently fine with the old version, if there's anything else i can do to help track down the problem let me know.

For the sake of completeness, i am using this ESP32-S3 board:
ESP32-S3 WROOM-1-N16R8 ESP32-S3-DevKitC-1

NOTE: "System" page of Ahoy-DTU at version 0.8.130 reports WiFi RSSI of -71.

yours,
Juergen

@Gubi2023
Copy link

hi, look under "System" to the reason of restarting your DTU. to me it looks like a reboot-loop whatever

@juepi
Copy link
Author

juepi commented Aug 23, 2024

Sounds plausible to me, will check this out.

@juepi
Copy link
Author

juepi commented Aug 23, 2024

You were perfectly right! "System" reports this:
image

I would assume the "unknown" status of the INT pin to be the reason for the problems, but as i'm writing this comment, the status suddenly changed:
image

As you can see in the new screenshot, ESP seems to have stopped rebooting, everything seems to work fine now.

This is the pinout which i've configured for the NRF:
"nrf":{"cs":37,"ce":38,"irq":47,"sclk":36,"mosi":35,"miso":48,"en":true}

When i manually issue a reboot on the ESP, the reboot loop once again starts and calms down after some time. I will stay on 0.8.141 for a while and see if it runs stable.

I have also attached a coredump file which was taken while the "reboot loops" were in progress, maybe it helps.
2024-08-23_11-17-09_v0.8.141_opendtufusion_coredump.bin.zip

@juepi juepi changed the title NTP / Networking issues with 0.8.140? "Task watchdog" Reboot loop with 0.8.140 and 0.8.141 / INT Pin status "unknown" for nRF24L01? Aug 23, 2024
@juepi
Copy link
Author

juepi commented Aug 23, 2024

Updated issue title according to the new findings.

@lumapu
Copy link
Owner

lumapu commented Aug 28, 2024

can you try to download a coredump from system page? It would be really helpful to better understand what happens. It would be helpful if you can do that with .140 version, but once the last crash was with .140 you also can read it using .130 version.

@lumapu lumapu self-assigned this Aug 28, 2024
@lumapu lumapu added bug Something isn't working and removed new new issue which need review by developer labels Aug 28, 2024
@juepi
Copy link
Author

juepi commented Aug 29, 2024

can you try to download a coredump from system page? It would be really helpful to better understand what happens. It would be helpful if you can do that with .140 version, but once the last crash was with .140 you also can read it using .130 version.

Hi Lukas,
Already added a coredump 2 posts ago from 0.8.141, can you work with this one? My ESP32 has an uptime of nearly 6 days now with 0.8.141 and is running without any issue handling zero-export for 2 inverters (changing power-limits every 5 seconds through MQTT).
As the system is live, i'd like to keep outages as low as possible 😉

Let me know if you still need the requested coredump from 0.8.140 and i'll create one.

yours,
Juergen

@juepi
Copy link
Author

juepi commented Aug 29, 2024

A small update from my side: i just had to reboot my AhoyDTU after a configuration change, it cam up instantly without the reboot-loop and Int-Pin working set to "true".

I have made the following changes due to Inverter1 being replaced from a HM-800 to HM-400:

  • Disabled Inverter1
  • changed S/N of Inverter1
  • Saved changes, reboot

EDIT: after another change (deleted Inverter1, reboot) the problem occured again, created another coredump:
2024-08-29_10-24-02_v0.8.141_opendtufusion_coredump.zip

So it seems that the reboot-loop does not occur on every reboot. Also, every reboot-loop occurence seems to end after a while (minutes) and after that, AhoyDTU is running perfectly well.

yours,
Juergen

@lumapu
Copy link
Owner

lumapu commented Aug 29, 2024

Hey Juergen,

I translated your Coredumps:

2024-08-23_11-17-09_v0.8.141_opendtufusion_coredump.bin
===============================================================
==================== ESP32 CORE DUMP START ====================

Crashed task handle: 0x3fcf69a4, name: '', GDB name: 'process 1070557604'

================== CURRENT THREAD REGISTERS ===================
exccause       0x1d (StoreProhibitedCause)
excvaddr       0x0
epc1           0x42079715
epc2           0x0
epc3           0x0
epc4           0x0
epc5           0x0
epc6           0x0
eps2           0x0
eps3           0x0
eps4           0x0
eps5           0x0
eps6           0x0


==================== CURRENT THREAD STACK =====================
pc             0x40377da5          0x40377da5 <panic_abort+21>
lbeg           0x40056f5c          1074098012
lend           0x40056f72          1074098034
lcount         0x0                 0
sar            0x4                 4
ps             0x60821             395297
threadptr      <unavailable>
br             <unavailable>
scompare1      <unavailable>
acclo          <unavailable>
acchi          <unavailable>
m0             <unavailable>
m1             <unavailable>
m2             <unavailable>
m3             <unavailable>
expstate       <unavailable>
f64r_lo        <unavailable>
f64r_hi        <unavailable>
f64s           <unavailable>
fcr            <unavailable>
fsr            <unavailable>
a0             0x8037d104          -2143825660
a1             0x3fc96a90          1070164624
a2             0x3fc96afa          1070164730
a3             0x3fc96b27          1070164775
a4             0xa                 10
a5             0x35                53
a6             0x0                 0
a7             0x3fc96a35          1070164533
a8             0x0                 0
a9             0x1                 1
a10            0x3fc96ade          1070164702
a11            0x3fc96ade          1070164702
a12            0xa                 10
a13            0x0                 0
a14            0x2c973d0           46756816
a15            0xffffff            16777215

======================== THREADS INFO =========================
#0  0x40377da5 in panic_abort (details=0x3fc96afa "abort() was called at PC 0x4204a300 on core 0") at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:408
#1  0x4037d104 in esp_system_abort (details=0x3fc96afa "abort() was called at PC 0x4204a300 on core 0") at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:137
#2  0x40383c10 in abort () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/abort.c:46
#3  0x4204a303 in task_wdt_isr (arg=<optimized out>) at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/task_wdt.c:176
#4  0x40379478 in _xt_lowint1 () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port/xtensa/xtensa_vectors.S:1118
#5  0x420cb9c2 in cpu_ll_waiti () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/hal/esp32s3/include/hal/cpu_ll.h:182
#6  esp_pm_impl_waiti () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_pm/pm_impl.c:853
#7  0x4204ab74 in esp_vApplicationIdleHook () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/freertos_hooks.c:63
#8  0x4037e70b in prvIdleTask (pvParameters=<optimized out>) at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/tasks.c:4099
Retrying reading threads information...


       TCB             NAME PRIO C/B  STACK USED/FREE
---------- ---------------- -------- ----------------
0x3fcf69a4                 1070556564/0           76/608
0x3fcf6f1c                 1070557964/0           84/608
0x3fceea00                 1070521840/18           88/672
0x3fcf359c                 1070539148/18          84/1168
0x3fcf7d4c                 1070560060/20           84/688
0x3fcf1750                 1070535488/24           88/608
0x3fcf622c                 1070551580/19           84/624
0x3fcf10ac                 1070533788/24           84/608
0x3fcb7b28                 1070343200/3        47280/624
0x3fcb788c                 1070336064/1        40812/672
0x3fcbfc08                 1070316536/10           80/640
0x3fcec150                 1070506304/1          88/1056
0x3fcb11ac                 1070266268/23           84/656
0x3fcf4d88                 1070545784/22           80/640

==================== THREAD 1 (TCB: 0x3fcf69a4, name: '') =====================


==================== THREAD 2 (TCB: 0x3fcf6f1c, name: '') =====================


==================== THREAD 3 (TCB: 0x3fceea00, name: '') =====================


==================== THREAD 4 (TCB: 0x3fcf359c, name: '') =====================


==================== THREAD 5 (TCB: 0x3fcf7d4c, name: '') =====================


==================== THREAD 6 (TCB: 0x3fcf1750, name: '') =====================


==================== THREAD 7 (TCB: 0x3fcf622c, name: '') =====================


==================== THREAD 8 (TCB: 0x3fcf10ac, name: '') =====================


==================== THREAD 9 (TCB: 0x3fcb7b28, name: '') =====================


==================== THREAD 10 (TCB: 0x3fcb788c, name: '') =====================


==================== THREAD 11 (TCB: 0x3fcbfc08, name: '') =====================


==================== THREAD 12 (TCB: 0x3fcec150, name: '') =====================


==================== THREAD 13 (TCB: 0x3fcb11ac, name: '') =====================


==================== THREAD 14 (TCB: 0x3fcf4d88, name: '') =====================



======================= ALL MEMORY REGIONS ========================
Name   Address   Size   Attrs
.rtc.text 0x600fe000 0x0 RW
.rtc.dummy 0x600fe000 0x0 RW
.rtc.force_fast 0x600fe000 0x0 RW
.rtc.force_slow 0x50000010 0x0 RW
.iram0.vectors 0x40374000 0x403 R XA
.iram0.text 0x40374404 0x1138f R XA
.dram0.data 0x3fc957a0 0x57d0 RW A
.noinit 0x3fc9af70 0x0 RW
.flash.text 0x42000020 0xd10c7 R XA
.flash.appdesc 0x3c0e0020 0x100 R  A
.flash.rodata 0x3c0e0120 0x47a4c RW A
.iram0.text_end 0x40385793 0x0 RW
.iram0.bss 0x40385794 0x0 RW
.dram0.heap_start 0x3fcae2a0 0x0 RW
.coredump.tasks.data 0x3fcf69a4 0x158 RW
.coredump.tasks.data 0x3fcf6730 0x260 RW
.coredump.tasks.data 0x3fcf6f1c 0x158 RW
.coredump.tasks.data 0x3fcf6ca0 0x260 RW
.coredump.tasks.data 0x3fceea00 0x158 RW
.coredump.tasks.data 0x3fcee740 0x2a0 RW
.coredump.tasks.data 0x3fcf359c 0x158 RW
.coredump.tasks.data 0x3fcf30f0 0x490 RW
.coredump.tasks.data 0x3fcf7d4c 0x158 RW
.coredump.tasks.data 0x3fcf7a80 0x2b0 RW
.coredump.tasks.data 0x3fcf1750 0x158 RW
.coredump.tasks.data 0x3fcf14d0 0x260 RW
.coredump.tasks.data 0x3fcf622c 0x158 RW
.coredump.tasks.data 0x3fcf5fa0 0x270 RW
.coredump.tasks.data 0x3fcf10ac 0x158 RW
.coredump.tasks.data 0x3fcf0e30 0x260 RW
.coredump.tasks.data 0x3fcb7b28 0x158 RW
.coredump.tasks.data 0x3fcc31a0 0x270 RW
.coredump.tasks.data 0x3fcb788c 0x158 RW
.coredump.tasks.data 0x3fcc1590 0x2a0 RW
.coredump.tasks.data 0x3fcbfc08 0x158 RW
.coredump.tasks.data 0x3fcbf970 0x280 RW
.coredump.tasks.data 0x3fcec150 0x158 RW
.coredump.tasks.data 0x3fcebd10 0x420 RW
.coredump.tasks.data 0x3fcb11ac 0x158 RW
.coredump.tasks.data 0x3fcb0f00 0x290 RW
.coredump.tasks.data 0x3fcf4d88 0x158 RW
.coredump.tasks.data 0x3fcf4af0 0x280 RW

===================== ESP32 CORE DUMP END =====================
===============================================================
2024-08-29_10-24-02_v0.8.141_opendtufusion_coredump.bin
===============================================================
==================== ESP32 CORE DUMP START ====================

Crashed task handle: 0x3fcf69a4, name: '', GDB name: 'process 1070557604'

================== CURRENT THREAD REGISTERS ===================
exccause       0x1d (StoreProhibitedCause)
excvaddr       0x0
epc1           0x42079715
epc2           0x0
epc3           0x0
epc4           0x0
epc5           0x0
epc6           0x0
eps2           0x0
eps3           0x0
eps4           0x0
eps5           0x0
eps6           0x0


==================== CURRENT THREAD STACK =====================
pc             0x40377da5          0x40377da5 <panic_abort+21>
lbeg           0x40056f5c          1074098012
lend           0x40056f72          1074098034
lcount         0x0                 0
sar            0x4                 4
ps             0x60e21             396833
threadptr      <unavailable>
br             <unavailable>
scompare1      <unavailable>
acclo          <unavailable>
acchi          <unavailable>
m0             <unavailable>
m1             <unavailable>
m2             <unavailable>
m3             <unavailable>
expstate       <unavailable>
f64r_lo        <unavailable>
f64r_hi        <unavailable>
f64s           <unavailable>
fcr            <unavailable>
fsr            <unavailable>
a0             0x8037d104          -2143825660
a1             0x3fc96a90          1070164624
a2             0x3fc96afa          1070164730
a3             0x3fc96b27          1070164775
a4             0xa                 10
a5             0x32                50
a6             0x0                 0
a7             0x3fc96a35          1070164533
a8             0x0                 0
a9             0x1                 1
a10            0x3fc96ade          1070164702
a11            0x3fc96ade          1070164702
a12            0xa                 10
a13            0x0                 0
a14            0x2c973d0           46756816
a15            0xffffff            16777215

======================== THREADS INFO =========================
#0  0x40377da5 in panic_abort (details=0x3fc96afa "abort() was called at PC 0x4204a300 on core 0") at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:408
#1  0x4037d104 in esp_system_abort (details=0x3fc96afa "abort() was called at PC 0x4204a300 on core 0") at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:137
#2  0x40383c10 in abort () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/abort.c:46
#3  0x4204a303 in task_wdt_isr (arg=<optimized out>) at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/task_wdt.c:176
#4  0x40379478 in _xt_lowint1 () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port/xtensa/xtensa_vectors.S:1118
#5  0x420cb9c2 in cpu_ll_waiti () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/hal/esp32s3/include/hal/cpu_ll.h:182
#6  esp_pm_impl_waiti () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_pm/pm_impl.c:853
#7  0x4204ab74 in esp_vApplicationIdleHook () at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/freertos_hooks.c:63
#8  0x4037e70b in prvIdleTask (pvParameters=<optimized out>) at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/tasks.c:4099
Retrying reading threads information...


       TCB             NAME PRIO C/B  STACK USED/FREE
---------- ---------------- -------- ----------------
0x3fcf69a4                 1070556564/0           76/608
0x3fcf6f1c                 1070557964/0           84/608
0x3fcf359c                 1070539148/18          84/1168
0x3fceea00                 1070521840/18           88/672
0x3fcf7d4c                 1070560060/20           84/688
0x3fcf1750                 1070535488/24           88/608
0x3fcf622c                 1070551580/19           84/624
0x3fcc1834                 1070342776/3         6660/624
0x3fcb788c                 1070336036/1        40796/672
0x3fcf10ac                 1070533788/24           84/608
0x3fcbfbec                 1070316508/10           84/640
0x3fcec150                 1070506304/1          88/1056
0x3fcb11ac                 1070266268/23           84/656
0x3fcf4d88                 1070545784/22           80/640

==================== THREAD 1 (TCB: 0x3fcf69a4, name: '') =====================


==================== THREAD 2 (TCB: 0x3fcf6f1c, name: '') =====================


==================== THREAD 3 (TCB: 0x3fcf359c, name: '') =====================


==================== THREAD 4 (TCB: 0x3fceea00, name: '') =====================


==================== THREAD 5 (TCB: 0x3fcf7d4c, name: '') =====================


==================== THREAD 6 (TCB: 0x3fcf1750, name: '') =====================


==================== THREAD 7 (TCB: 0x3fcf622c, name: '') =====================


==================== THREAD 8 (TCB: 0x3fcc1834, name: '') =====================


==================== THREAD 9 (TCB: 0x3fcb788c, name: '') =====================


==================== THREAD 10 (TCB: 0x3fcf10ac, name: '') =====================


==================== THREAD 11 (TCB: 0x3fcbfbec, name: '') =====================


==================== THREAD 12 (TCB: 0x3fcec150, name: '') =====================


==================== THREAD 13 (TCB: 0x3fcb11ac, name: '') =====================


==================== THREAD 14 (TCB: 0x3fcf4d88, name: '') =====================



======================= ALL MEMORY REGIONS ========================
Name   Address   Size   Attrs
.rtc.text 0x600fe000 0x0 RW
.rtc.dummy 0x600fe000 0x0 RW
.rtc.force_fast 0x600fe000 0x0 RW
.rtc.force_slow 0x50000010 0x0 RW
.iram0.vectors 0x40374000 0x403 R XA
.iram0.text 0x40374404 0x1138f R XA
.dram0.data 0x3fc957a0 0x57d0 RW A
.noinit 0x3fc9af70 0x0 RW
.flash.text 0x42000020 0xd10c7 R XA
.flash.appdesc 0x3c0e0020 0x100 R  A
.flash.rodata 0x3c0e0120 0x47a4c RW A
.iram0.text_end 0x40385793 0x0 RW
.iram0.bss 0x40385794 0x0 RW
.dram0.heap_start 0x3fcae2a0 0x0 RW
.coredump.tasks.data 0x3fcf69a4 0x158 RW
.coredump.tasks.data 0x3fcf6730 0x260 RW
.coredump.tasks.data 0x3fcf6f1c 0x158 RW
.coredump.tasks.data 0x3fcf6ca0 0x260 RW
.coredump.tasks.data 0x3fcf359c 0x158 RW
.coredump.tasks.data 0x3fcf30f0 0x490 RW
.coredump.tasks.data 0x3fceea00 0x158 RW
.coredump.tasks.data 0x3fcee740 0x2a0 RW
.coredump.tasks.data 0x3fcf7d4c 0x158 RW
.coredump.tasks.data 0x3fcf7a80 0x2b0 RW
.coredump.tasks.data 0x3fcf1750 0x158 RW
.coredump.tasks.data 0x3fcf14d0 0x260 RW
.coredump.tasks.data 0x3fcf622c 0x158 RW
.coredump.tasks.data 0x3fcf5fa0 0x270 RW
.coredump.tasks.data 0x3fcc1834 0x158 RW
.coredump.tasks.data 0x3fcc3000 0x270 RW
.coredump.tasks.data 0x3fcb788c 0x158 RW
.coredump.tasks.data 0x3fcc1580 0x2a0 RW
.coredump.tasks.data 0x3fcf10ac 0x158 RW
.coredump.tasks.data 0x3fcf0e30 0x260 RW
.coredump.tasks.data 0x3fcbfbec 0x158 RW
.coredump.tasks.data 0x3fcbf950 0x280 RW
.coredump.tasks.data 0x3fcec150 0x158 RW
.coredump.tasks.data 0x3fcebd10 0x420 RW
.coredump.tasks.data 0x3fcb11ac 0x158 RW
.coredump.tasks.data 0x3fcb0f00 0x290 RW
.coredump.tasks.data 0x3fcf4d88 0x158 RW
.coredump.tasks.data 0x3fcf4af0 0x280 RW

===================== ESP32 CORE DUMP END =====================
===============================================================

Both show the same behavior, the 🐶 is not fed. But I still don't know where it needs to be fed more.
As I saw you only have configured NRF. There is no CMT and no MqTT.

What is about display, is it configured?

Does your ESP crash by itself or only if the WebUI is open?

@juepi
Copy link
Author

juepi commented Aug 29, 2024

Hi Lukas,

Both show the same behavior, the 🐶 is not fed. But I still don't know where it needs to be fed more. As I saw you only have configured NRF. There is no CMT and no MqTT.

Yes, MQTT is used, but at the stage where the coredumps have been downloaded, it did not yet work.
EDIT: see first screenshot here

What is about display, is it configured?

No display connected, only the NRF radio.

Does your ESP crash by itself or only if the WebUI is open?

Uh, hard to tell. What i can tell for sure is that it only happens on startup/reboot. As soon as it switches into a "stable mode", it seems to run perfectly well (at least for a week as far as i can tell by now).

yours,
Juergen

P.s.: coredump added in working state - maybe it helps?
2024-08-29_23-17-10_v0.8.141_opendtufusion_coredump.zip
Note: Inverter0 is offline, Inverter1 is powered down in this coredump (battery drained).

@juepi
Copy link
Author

juepi commented Aug 30, 2024

One more thing concerning your question about accessing the WebUI: if you're thinking of the new AsyncWebserver of 0.8.141 causing the issue, i also had the same problem with 0.8.140.

yours,
Juergen

@Gubi2023
Copy link

lt. deinem Screenshot hat du ein sehr hohen MqTT-Verkehr: fast 3000 Tx in 4 min! Ist das normal, oder kann sich da die DTU verschlucken?
360873649-63d9f2d2-74cc-438c-8441-a7cbdfdea1ee

@juepi
Copy link
Author

juepi commented Aug 30, 2024

lt. deinem Screenshot hat du ein sehr hohen MqTT-Verkehr: fast 3000 Tx in 4 min! Ist das normal, oder kann sich da die DTU verschlucken?

Also es ist etwas über meinem Durchschnitt, über die letzten 24h komme ich auf etwa 400TX/min, wobei in der Nacht die Inverter deaktiviert waren, so gesehen wäre das mit ca. 600 TX/min schon plausibel. habe ein MQTT- und Inverter-Intervall von 5 Sekunden konfiguriert, das entspricht dem Datenintervall meines SmartMeters. Nulleinspeiseregelung läuft über FHEM (Perl-script) und wird per MQTT an AhoyDTU geliefert (limits), was in diesem Setup sehr gut funktioniert.

Wie gesagt: sobald das Werkl mal läuft nach der "startup reboot loop" läuft das sehr stabil. Ging sogar mit ESP8266, dort am Ende allerdings mit Stabilitätsproblemen, deswegen der Wechsel auf ESP32-S3.

lg,
Jürgen

@lumapu
Copy link
Owner

lumapu commented Aug 30, 2024

leider zeigt auch der dritte Coredump das gleiche Bild. Kannst du mal testweise das MqTT intervall auf 0 setzen, d.h. nicht, dass keine Daten geliefert werden, sondern immer dann wenn neue zur Verfügung stehen.

@juepi
Copy link
Author

juepi commented Aug 31, 2024

Wollte ich gerade umstellen - steht schon auf 0! Shame on me, sorry für die Fehlinformation!

@juepi
Copy link
Author

juepi commented Sep 1, 2024

Morgen,

Habe gerade ein interessantes Verhalten festgestellt: nach dem reboot (inkl. MQTT broker) meines Servers (WiFI inkl. NTP und Namensauflösung blieb online) tritt das gleiche Verhalten auf! AohyDTU geht in die reboot-schleife und "erholt" sich wieder nach einiger Zeit..

Hier nochmal ein Dump:
2024-09-01_09-52-34_v0.8.141_opendtufusion_coredump.zip

Dieser entstand nach einer "MQTT broker offline reboot schleife" (zu dem Zeitpunkt war der Broker aber schon wieder online und AhoyDTU hat sich erholt).

Eventuell ein buffer-overflow nach dem boot, weil MQTT messages zum senden anstehen aber der broker noch nicht connected hat?

lg,
Jürgen

@Gubi2023
Copy link

Gubi2023 commented Sep 1, 2024

sieht sehr danach aus woran wir alle gerade knappern
MqTT läuft voll und bekommt die Pakete nicht abgeschickt, Ahoy bricht dann zusammen, da kein Speicher mehr da ist

scheint dasselbe Problem zu sein....

@juepi
Copy link
Author

juepi commented Sep 1, 2024

Aha, spannend finde ich dass das bei mir offenbar nur beim booten zum Problem wird. Wie gesagt, einmal im Betrieb läuft das wie ein Glöckerl.

image
Selbst bei hohem Regelaufwand keinerlei Probleme.

lg,
Jürgen

@Gubi2023
Copy link

Gubi2023 commented Sep 1, 2024

also bei mir ist nach spätestens 24h Schluss, dann restartet sich die DTU mit Task Watchdog. Leider verabschiedet sich auch mein ESP8266 mit -min-Konfig immer wieder mit "Exception"

@juepi
Copy link
Author

juepi commented Sep 1, 2024

also bei mir ist nach spätestens 24h Schluss, dann restartet sich die DTU mit Task Watchdog

nein, die 0.8.141 lief bei mir bereits fast 1 Woche durch ohne Probleme, es ist tatsächlich immer nur der boot der (1-2 Minuten) hunzt, oder eben wenn der MqTT Broker ausfällt.

lg,
Jürgen

@juepi juepi changed the title "Task watchdog" Reboot loop with 0.8.140 and 0.8.141 / INT Pin status "unknown" for nRF24L01? "Task watchdog" Reboot loop at startup with 0.8.140 and 0.8.141 / INT Pin status "unknown" for nRF24L01? Sep 4, 2024
@juepi
Copy link
Author

juepi commented Sep 4, 2024

Also gemittelt über die letzten 3 tage habe ich ca. 560 MqTT TX pro Minute. Keine resets von AhoyDTU in dieser Zeit.

lg,
Jürgen

@juepi
Copy link
Author

juepi commented Oct 3, 2024

Update meinerseits: habe gerade auf die 0.8.150 aktualisiert, beide Probleme (reboot loop beim startup als auch "INT pin status unknown") treten bei mir nicht mehr auf, danke Lukas! 👍

Update: I've just upgraded to 0.8.150, both described problems (reboot loop at firmware startup and the unknown "INT pin status") do not occur any longer, thanks Lukas!

lg,
Jürgen

P.S.: 0.8.151 also works without the mentioned problems 😉

@juepi juepi closed this as completed Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants