Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: reduce file system corruption from standalone Lua scripts #5148

Merged
merged 1 commit into from
Jun 21, 2024

Conversation

3djc
Copy link
Collaborator

@3djc 3djc commented Jun 10, 2024

This possibly fixes LUA sd corruption. This PR is raised to allow easier testing

While at it, use more sdio_check_card_state_with_timeout();

Finding by @philmoz

This fixes #5077

@frankiearzu
Copy link
Contributor

frankiearzu commented Jun 14, 2024

@3djc should we add the check also when the buffer is not aligned on reads??? The code before the HAL change had the check on both. The write has it on both (aligned and not aligned buffers)
Another option is to move that code inside the read_dma, and write_dma to not have that much code duplications.

image

Another difference with the "pre-hal" code is that that the disk status check (check the TRANSFER STATUS) is done if the read/write waiting for completion fails (currently we do only if succeeds).. the current code is short-cutting to "return" exit if the read/write waiting times out (failed).
Probably on "Fail" is more important to check the status.

Other difference is the transfer status on the original code is in an infinite loop until succeed or fail the transfer (not time out), new code is with a timeout. should we use the version of the sdio_check_card.. without a timeout??

From the original HAL ticket: (also seems that all that code should be inside the read_dma/write_dma).
image

Tried my proposed changes on my local branch and works, but did not solve the original problem with the LOGS corruption, but everything still works the same as before.

@frankiearzu
Copy link
Contributor

frankiearzu commented Jun 14, 2024

On my local branch, i move the transfer check to inside the read_dma/write_dma:

  1. Replicated the original logic... the transfer wait is done if the waiting for read/write completed timeout (as in the pre-hal code). No shorcut exit.
  2. Loop without timeout... until the transfer either succeed or failed.
  3. Removed the transfer checks from read/write since now that logic is inside read_dma/write_dma.. removing many cut/paste parts of the same code.

image
image

@raphaelcoeffic
Copy link
Member

but did not solve the original problem with the LOGS corruption

Assuming that this PR "fixes" (actually, it masks it, but well) the original issue, I don't think adding all the clutter back is gonna help anyone here.

@frankiearzu
Copy link
Contributor

@raphaelcoeffic agreed... this probably just make it harder to reproduce, but is not the fix to the root cause problem.
Will do a proposed change on the timing of the DMA (proposed on the other ticket), but haven't had time to test (workdays :-)

@raphaelcoeffic
Copy link
Member

Frankly, I'm still not convinced the SD block layer is at fault. Everything points at the way we use FatFs, and FatFs behaviour.

@richardclli
Copy link
Collaborator

@raphaelcoeffic agreed... this probably just make it harder to reproduce, but is not the fix to the root cause problem. Will do a proposed change on the timing of the DMA (proposed on the other ticket), but haven't had time to test (workdays :-)

You can do a quick test to lower the SDIO speed by changing a line, if it still fails, then it is not about DMA timing anyway.

@frankiearzu
Copy link
Contributor

@richardclli I did lower the timing and still failed. I think probably put this to rest until we get others to report a similar problem.
DSM forward prog seems to produce the error really quick,, more repetitions with other LUA scripts. Changed the forward programming script to stop writing to the logs in the meantime.

@pfeerick pfeerick added this to the 2.10.2 milestone Jun 21, 2024
@pfeerick pfeerick added the bug 🪲 Something isn't working label Jun 21, 2024
@pfeerick pfeerick changed the title feat: using FF_FS_TINY=1 fix: reduce file system corruption from standalone Lua scripts Jun 21, 2024
@pfeerick pfeerick merged commit 1867e73 into main Jun 21, 2024
45 checks passed
@pfeerick pfeerick deleted the 3djc/FF_FS_TINY branch June 21, 2024 00:09
pfeerick pushed a commit that referenced this pull request Jun 21, 2024
Standalone Lua such as the DSM Forward Programming tool do a lot of file
system writing (logging) and appears to overrun some buffer in FafFs,
writing file data right into the FAT, thus corrupting the file system. 

Using FF_FS_TINY=1 seems to mitigate this significantly, but does not 
appear to completely resolve this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SDCard corruption of files and directory/folders
5 participants