Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SD Card access causes crash in development branch, but not in release branch (CAUSED BY WATCHDOG - see below) #21473

Closed
13 of 14 tasks
btsimonh opened this issue May 23, 2024 · 6 comments · Fixed by #21486
Closed
13 of 14 tasks

Comments

@btsimonh
Copy link
Contributor

PROBLEM DESCRIPTION

I was experiencing random crashes on esp32cam writing and reading from SD card.

I found a way to reliably reproduce the crash on esp32cam and esp32s3cam boards.

Further testing on the S3 board only indicates that I did not see a crash with the release version
(tasmota32s3.bin | http://ota.tasmota.com/tasmota32/release/tasmota32s3.bin | 1956k | 20240514 16:12)

but does crash reliably with the development version:
(tasmota32s3.bin | http://ota.tasmota.com/tasmota32/tasmota32s3.bin | 1973k | 20240522 17:19

Crashes happen on writing files, listing folders and reading files, in unmodified firmware.

I have not done equivalent testing of internal flash.

REQUESTED INFORMATION

Make sure your have performed every step and checked the applicable boxes before submitting your issue. Thank you!

  • Read the Contributing Guide and Policy and the Code of Conduct
  • Searched the problem in issues
  • Searched the problem in discussions
  • Searched the problem in the docs
  • Searched the problem in the chat
  • Device used (e.g., Sonoff Basic): esp32cam (aithinker style) and esp32S3 with cam module
  • Tasmota binary firmware version number used: 14.0.0.1
    • Pre-compiled
    • Self-compiled
  • Flashing tools used: Web upgrade
  • Provide the output of command: Backlog Template; Module; GPIO 255:
  Configuration output here:
Backlog Template; Module; GPIO 255
12:48:40.277 SRC: WebConsole from 192.168.1.174
12:48:40.278 CMD: Grp 0, Cmd 'BACKLOG', Idx 1, Len 26, Pld -99, Data 'Template; Module; GPIO 255'
12:48:40.316 SRC: Backlog
12:48:40.319 CMD: Grp 0, Cmd 'TEMPLATE', Idx 1, Len 0, Pld -99, Data ''
12:48:40.324 RSL: RESULT = {"NAME":"ESP32S3-FNK085","GPIO":[1,1,1,1,5024,5056,5120,5152,5090,5089,5091,5088,5092,5184,1,4992,5095,5094,5093,1,1,1,0,0,0,0,0,8800,8832,8864,1,1,1,1,1,1,1,1],"FLAG":0,"BASE":1}
12:48:40.522 SRC: Backlog
12:48:40.523 CMD: Grp 0, Cmd 'MODULE', Idx 1, Len 0, Pld -99, Data ''
12:48:40.525 RSL: RESULT = {"Module":{"0":"ESP32S3-FNK085"}}
12:48:40.725 SRC: Backlog
12:48:40.727 CMD: Grp 0, Cmd 'GPIO', Idx 1, Len 3, Pld 255, Data '255'
12:48:40.733 RSL: RESULT = {"GPIO0":{"0":"None"},"GPIO1":{"0":"None"},"GPIO2":{"0":"None"},"GPIO3":{"0":"None"},"GPIO4":{"5024":"CAM_SIOD"},"GPIO5":{"5056":"CAM_SIOC"},"GPIO6":{"5120":"CAM_VSYNC"},"GPIO7":{"5152":"CAM_HREF"},"GPIO8":{"5090":"CAM_DATA"},"GPIO9":{"5089":"CAM_DATA"},"GPIO10":{"5091":"CAM_DATA"},"GPIO11":{"5088":"CAM_DATA"},"GPIO12":{"5092":"CAM_DATA"},"GPIO13":{"5184":"CAM_PCLK"},"GPIO14":{"0":"None"},"GPIO15":{"4992":"CAM_XCLK"},"GPIO16":{"5095":"CAM_DATA"},"GPIO17":{"5094":"CAM_DATA"},"GPIO18":{"5093":"CAM_DATA"},"GPIO19":{"0":"None"},"GPIO20":{"0":"None"},"GPIO21":{"0":"None"},"GPIO33":{"0":"None"},"GPIO34":{"0":"None"},"GPIO35":{"0":"None"},"GPIO36":{"0":"None"},"GPIO37":{"0":"None"},"GPIO38":{"8800":"SDIO CMD"},"GPIO39":{"8832":"SDIO CLK"},"GPIO40":{"8864":"SDIO D0"},"GPIO41":{"0":"None"},"GPIO42":{"0":"None"},"GPIO43":{"0":"None"},"GPIO44":{"0":"None"},"GPIO45":{"0":"None"},"GPIO46":{"0":"None"},"GPIO47":{"0":"None"},"GPIO48":{"1376":"WS28121"}}
12:48:45.161 WIF: Checking connection..
  • If using rules, provide the output of this command: Backlog Rule1; Rule2; Rule3:
  Rules output here:

  • Provide the output of this command: Status 0:
  STATUS 0 output here:
12:49:36.554 CMD: status 0
12:49:36.555 SRC: WebConsole from 192.168.1.174
12:49:36.556 CMD: Grp 0, Cmd 'STATUS', Idx 1, Len 1, Pld 0, Data '0'
12:49:36.559 RSL: STATUS = {"Status":{"Module":0,"DeviceName":"Tasmota","FriendlyName":["Tasmota"],"Topic":"tasmota_9CE64C","ButtonTopic":"0","Power":"0","PowerLock":"0","PowerOnState":3,"LedState":1,"LedMask":"FFFF","SaveData":1,"SaveState":1,"SwitchTopic":"0","SwitchMode":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"ButtonRetain":0,"SwitchRetain":0,"SensorRetain":0,"PowerRetain":0,"InfoRetain":0,"StateRetain":0,"StatusRetain":0}}
12:49:36.565 RSL: STATUS1 = {"StatusPRM":{"Baudrate":115200,"SerialConfig":"8N1","GroupTopic":"tasmotas","OtaUrl":"http://ota.tasmota.com/tasmota32/tasmota32s3.bin","RestartReason":"Software reset CPU","Uptime":"0T00:12:55","StartupUTC":"2024-05-23T11:36:41","Sleep":1,"CfgHolder":4617,"BootCount":158,"BCResetTime":"2023-08-14T12:16:33","SaveCount":400}}
12:49:36.572 RSL: STATUS2 = {"StatusFWR":{"Version":"14.0.0.1(522f6c5-tasmota32)","BuildDateTime":"2024-05-22T14:31:07","Core":"3_0_0","SDK":"5.1.3.240430","CpuFrequency":240,"Hardware":"ESP32-S3 v0.1","CR":"379/699"}}
12:49:36.577 RSL: STATUS3 = {"StatusLOG":{"SerialLog":4,"WebLog":4,"MqttLog":0,"SysLog":0,"LogHost":"","LogPort":514,"SSId":["MYSSID",""],"TelePeriod":90,"Resolution":"558180C0","SetOption":["00008009","2805C80001000600003C5A0A192800000000","00000080","00006000","00004002","00000000"]}}
12:49:36.587 RSL: STATUS4 = {"StatusMEM":{"ProgramSize":1972,"Free":907,"Heap":198,"StackLowMark":3,"PsrMax":0,"PsrFree":0,"ProgramFlashSize":8192,"FlashSize":8192,"FlashChipId":"1740C8","FlashFrequency":80,"FlashMode":"QIO","Features":["0809","9F9AD7DF","0015A001","B7F7BFCF","05DA9BC4","E0360DC7","480840D2","20200000","D4BC482D","810A80B1","00000014"],"Drivers":"1,2,!3,4,!5,7,!8,9,10,11,12,!14,!16,!17,!20,!21,!24,26,!27,29,!34,!35,38,50,52,!59,!60,62,!63,!66,!67,!68,!73,82,!86,!87,!88,!121","Sensors":"1,2,3,5,6,7,8,9,10,11,12,13,14,15,17,18,19,20,21,22,26,31,34,37,39,40,42,43,45,51,52,55,56,58,59,64,66,67,74,85,92,95,98,103,105,109,127","I2CDriver":"7,8,9,10,11,12,13,14,15,17,18,20,24,29,31,36,41,42,44,46,48,58,62,65,69,76,77,82"}}
12:49:36.594 RSL: STATUS5 = {"StatusNET":{"Hostname":"tasmota-9CE64C-1612","IPAddress":"192.168.1.198","Gateway":"192.168.1.254","Subnetmask":"255.255.255.0","DNSServer1":"192.168.1.254","DNSServer2":"0.0.0.0","Mac":"34:85:18:9C:E6:4C","IP6Global":"2a00:23c6:661f:c801:3685:18ff:fe9c:e64c","IP6Local":"fe80::3685:18ff:fe9c:e64c%st1","Ethernet":{"Hostname":"","IPAddress":"0.0.0.0","Gateway":"0.0.0.0","Subnetmask":"0.0.0.0","DNSServer1":"192.168.1.254","DNSServer2":"0.0.0.0","Mac":"00:00:00:00:00:00","IP6Global":"","IP6Local":""},"Webserver":2,"HTTP_API":1,"WifiConfig":4,"WifiPower":19.0}}
12:49:36.600 RSL: STATUS6 = {"StatusMQT":{"MqttHost":"","MqttPort":1883,"MqttClientMask":"DVES_%06X","MqttClient":"DVES_9CE64C","MqttUser":"DVES_USER","MqttCount":0,"MAX_PACKET_SIZE":1200,"KEEPALIVE":30,"SOCKET_TIMEOUT":4}}
12:49:36.604 RSL: STATUS7 = {"StatusTIM":{"UTC":"2024-05-23T11:49:36Z","Local":"2024-05-23T12:49:36","StartDST":"2024-03-31T02:00:00","EndDST":"2024-10-27T03:00:00","Timezone":"+01:00","Sunrise":"04:58","Sunset":"20:35"}}
12:49:36.612 RSL: STATUS10 = {"StatusSNS":{"Time":"2024-05-23T12:49:36"}}
12:49:36.615 BRY: GC from 7858 to 4292 bytes, objects freed 9/48 (in 0 ms) - slots from 53/91 to 52/91
12:49:36.620 RSL: STATUS11 = {"StatusSTS":{"Time":"2024-05-23T12:49:36","Uptime":"0T00:12:55","UptimeSec":775,"Heap":197,"SleepMode":"Dynamic","Sleep":1,"LoadAvg":1002,"MqttCount":0,"Berry":{"HeapUsed":4,"Objects":48},"POWER":"OFF","Dimmer":15,"Color":"262626","HSBColor":"0,0,15","Channel":[15,15,15],"Scheme":0,"Width":1,"Fade":"OFF","Speed":1,"LedTable":"ON","Wifi":{"AP":1,"SSId":"MYSSID","BSSId":"AC:B6:87:C7:0E:B8","Channel":6,"Mode":"HT20","RSSI":100,"Signal":-48,"LinkCount":1,"Downtime":"0T00:00:04"}}}
12:49:36.628 RSL: STATUS12 = {"StatusSTK":{"Exception":1070552992,"Reason":"Unknown","EPC":"4037ca8a","EXCVADDR":"00000000","CallChain":["4037ca87","420afb15","421363bb"]}}
12:49:45.159 WIF: Checking connection...
  • Set weblog to 4 and then, when you experience your issue, provide the output of the Console log:
  Console output here:
12:36:09.174 HTP: Main Menu
12:36:15.321 HTP: Tools
12:36:17.450 HTP: Manage File system
12:36:17.451 UFS: File '/' download
12:36:17.453 UFS: File '/' to download is directory
12:36:17.924 WIF: Checking connection...
12:36:19.299 HTP: Manage File system
12:36:19.300 UFS: File '/timelapse' download
12:36:19.303 UFS: File '/timelapse' to download is directory
12:36:36.200 HTP: Manage File system
12:36:36.201 UFS: File '/timelapse/tl' download
12:36:36.210 UFS: File '/timelapse/tl' to download is directory




Backtracxx����x����������xx��������x����xxxx����xx���x������xxxxx������x����xxxx�����x���x�����x�����������xx���x����xxxx����xxxxxxxxxxxxxxxx���x��xxxx���xxx��x���������x��x���xxx���xxx��x���x��x��xxxx��xxxxxxx��xxxxxxxx���
00:00:00.252 CMD: Fall back to serial port, no SOF packet detected on USB port
00:00:00.253 HDW: ESP32-S3 v0.1 
00:00:00.306 UFS: FlashFS mounted with 4376 kB free
00:00:00.324 CFG: Loaded from File, Count 399

TO REPRODUCE

unzip the attached zip onto an SD card, insert into the device.

Monitor the serial of the device.

Using the menus, go to Tools/Manage File System

click on timelapse

click on tl (contains ~1600 files)

files will begin to be listed in the browser window. You may get as far as 45-75 files listed before the device crashes.

Note that this is a way to reproduce, but the issue affects general read/write of the SD card, and is not necessarily specifically related to listing files. e.g. it was crashing writing files every 10s to the folder. Backtraces indicated it was in different functions, e.g. sometimes fclose() on a file write. But this reproduction does not involve writing...

EXPECTED BEHAVIOUR

The device should not crash when accessing the SD card.

SCREENSHOTS

n/a

ADDITIONAL CONTEXT

use the contents of this zip file to easily reproduce:

timelapse.zip

(Please, remember to close the issue when the problem has been addressed)

@arendst
Copy link
Owner

arendst commented May 23, 2024

FYI any exception with Reason unknown is likely caused by watchdogs enabled recently in dev branch.

To stop chasing windmills I suggest you disable the watchdog feature and continue with what you want to do.

EDIT: Add this #undef USE_ESP32_WDT to your user_config_override.h

@btsimonh
Copy link
Contributor Author

you are of course correct. adding this to user_config_override.h, and rebuilding for my esp32cam, it's listed 430+ files so far (better than 45).
We could probably do with having a configuration to turn it off. There are always going to be odd circumstances which can't be tested for and so be accounted for with calls to tickle the watchdog. At least then we could continue to use the 'current' version whilst a strange issue is mitigated in some other way without a rebuild.

@btsimonh btsimonh changed the title SD Card access causes crash in development branch, but not in release branch SD Card access causes crash in development branch, but not in release branch (CAUSED BHY WATCHDOG - see below) May 23, 2024
@btsimonh btsimonh changed the title SD Card access causes crash in development branch, but not in release branch (CAUSED BHY WATCHDOG - see below) SD Card access causes crash in development branch, but not in release branch (CAUSED BY WATCHDOG - see below) May 23, 2024
@btsimonh
Copy link
Contributor Author

it dies in a different way now.... no crash log on the serial, just hangs with no network access, and serial logs turned themselves off... but then I'm giving it hell, so I'm not surprised.

@btsimonh
Copy link
Contributor Author

btsimonh commented May 24, 2024

re-opening - the watchdog triggers in the reproduction above because of listing a very large folder, and so hanging the main loop - which we could solve with some yield(). But with further testing any SD access can cause a main loop hang.
I will do some further investigation, and come up with another reproduction; maybe some Berry that will cause it.
It seems fine on my S3 board, but not on a 'normal' esp32cam board. I will also re-test dev vs release.

Edit - a self-build release build stalls main loop - first time at first file download. 2nd time after 471 file downloads.

@btsimonh btsimonh reopened this May 24, 2024
@btsimonh
Copy link
Contributor Author

I think a bug in the UFSServe function. Possibly streaming direct from file to socket is not a good idea...
After more testing, I'll PR, and include in the PR a fix for listing large folders with Watchdog on (the original reproduction code).

@Jason2866
Copy link
Collaborator

The reason why the WDT is enabled is to find code parts which blocks Tasmota (way too long). Goal is to change long blocking code in "normal" working code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants