Take 2 on `#[ram]` soundness #1677

wetheredge · 2024-06-14T02:55:48Z

Submission Checklist 📝

I have updated existing examples or added new ones (if applicable).
I have used cargo xtask fmt-packages command to ensure that all changed code is formatted correctly.
My changes were added to the CHANGELOG.md in the proper section.
My changes are in accordance to the esp-rs API guidelines

Extra:

I have read the CONTRIBUTING.md guide and followed its instructions.

Pull Request Details 📖

Description

Closes #1110, closes #1650, closes #1649

New PR as this is a full rewrite. It also fixes #1650 since both #[ram(persistent)] and #[ram(zeroed)] require the same check that the type can take an all-zero bit pattern. I used bytemuck::Zeroable since it seems like the most widely used, rather than implement. It did mean adding bytemuck as a dependency of esp-hal, but I used version 1.0.0 to maximize compatibility with whatever version is probably already in users' dependency graphs.

Error message for usage on non-Zeroable types

error[E0277]: the trait bound `NonZero<u8>: bytemuck::zeroable::Zeroable` is not satisfied
   --> src/bin/ram.rs:40:18
    |
40  | static NON_ZERO: NonZeroU8 = NonZeroU8::MIN;
    |                  ^^^^^^^^^ the trait `bytemuck::zeroable::Zeroable` is not implemented for `NonZero<u8>`
    |
    = help: the following other types implement trait `bytemuck::zeroable::Zeroable`:
              bool
              char
              isize
              i8
              i16
              i32
              i64
              i128
            and 73 others
note: required by a bound in `assert_is_zeroable`
   --> …/esp-hal/esp-hal/src/lib.rs:229:40
    |
229 |     pub const fn assert_is_zeroable<T: bytemuck::Zeroable>() {}
    |                                        ^^^^^^^^^^^^^^^^^^ required by this bound in `assert_is_zeroable`

For more information about this error, try `rustc --explain E0277`.

If exposing the #[doc(hidden)] function name in the error message is undesirable, I use the method I did in #1649.

Testing

The updated ram example ran on an S3 dev board. I do not have any of the RISC-V ESPs, so someone else will need to test that once it's implemented.

Additional To-dos

Update esp-riscv-rt

esp-hal/src/soc/esp32s3/mod.rs

wetheredge · 2024-06-22T00:54:53Z

It will now only zero persistent ram after initial boot, since, I think¹ that's the only time the RTC ram is not preserved. IMO, this should be expanded to all the reset reasons that could theoretically happen before the init had finished. It seems to me that that would be anything not caused by a watchdog or software (for S3: brown out, clock glitch, efuse error, and usb uart/jtag resets).

I also updated esp-riscv-rt to match the xtensa behavior. I don't have any risc-v esps to test on, but it did seem to work for esp32c3 in qemu.

Additionally, I noticed the rv-init-data and rv-init-rtc-data features (and the esp-riscv-rt features they enable) seem to have only been used for the old direct boot support. I also attempted to find uses of them in a GitHub-wide search and did not see any outside of forks of esp-hal. How do you feel about removing these features and perhaps merging the existing zero-rtc-fast-bss and new zero-rtc-fast-persistent into one rtc-ram feature?

Based on the Reset and Clock chapter of the ESP32, ESP32-S3, and ESP32-C3 reference manuals ↩

esp-hal-procmacros/src/lib.rs

bjoernQ · 2024-06-24T08:20:05Z

Additionally, I noticed the rv-init-data and rv-init-rtc-data features (and the esp-riscv-rt features they enable) seem to have only been used for the old direct boot support. I also attempted to find uses of them in a GitHub-wide search and did not see any outside of forks of esp-hal. How do you feel about removing these features and perhaps merging the existing zero-rtc-fast-bss and new zero-rtc-fast-persistent into one rtc-ram feature?

I tend to agree - maybe good to hear others opinion on that, too

bjoernQ · 2024-06-25T08:04:45Z

Looks quite good overall - I guess after adding a CHANGELOG.md entry this should be good to go

One thing maybe worth considering would be to have a (doc-hidden) function in esp-hal like should_reset_rtc_persisted_ram (naming is debatable) which decides on the reset reasons for both - Xtensa and RISC-V. Would make it easier to change the decision when to zero memory (if we need to reconsider the condition). Not mandatory - just an idea

jessebraham

This is looking quite good to me, thanks for taking the time to tackle those issues. Once CHANGELOG.md has been updated I think this should be good to go. I'll give @bjoernQ the final say on this, though.

wetheredge · 2024-06-26T02:11:11Z

Great. My last concern is with the edge case of an externally triggered RTC RAM-preserving reset (brown out, power glitch, jtag reset, etc.) occurring very early in the first boot such that the zero initialization gets skipped, causing undefined behavior. If this is a worry, which kinds of resets should be added to the list to trigger zeroing?

I wrote the condition in assembly to match the rest of the risc-v init, since I'm not sure how it interacts with the soundness issues that prompted the decision to avoid Rust there.

bjoernQ · 2024-06-26T06:49:29Z

Probably all the reset reasons which are chip-reset or system-reset should always trigger zeroing:

(from C6 TRM)

I wrote the condition in assembly to match the rest of the risc-v init, since I'm not sure how it interacts with the soundness issues that prompted the decision to avoid Rust there.

Yes - makes sense. In theory when being cautious writing such a function it should be okay but sure - in assembly it's easier to know what really happens 👍

wetheredge · 2024-07-01T05:41:24Z

If that looks good, I'll update the changelog and rebase.

The risc-v code could also use some testing on real chips—I've only been able to test on an S3 and qemu.

bjoernQ · 2024-07-01T08:07:48Z

Seems like the description in the TRM fooled me a bit and "System Reset: resets the whole digital system, including LP system." doesn't mean LP/RTC RAM is reset. So only "Chip Reset: resets the whole chip." seems to be a reason to zero the memory. (Sorry for the confusion)

Maybe we should just assume "Brown-out system reset" is "Chip Reset" (it can also be System Reset according to the TRM) and only zero out memory for these two reasons

wetheredge · 2024-07-03T02:55:48Z

I interpreted that to mean that a minor brown out could cause a system reset (preserving RTC RAM) and would give a reset code of 0x0F, while a more severe brownout could cause a chip reset with code 0x01. If it could give code 0x0F even after causing a chip reset, then I agree—definitely need to init after that.

I had expanded the list to guard against something like this:

Initial boot starts (reset reason = ChipPowerOn)
While the bootloader is running, something triggers a reset
Second boot starts and completes with a non-ChipPowerOn reset reason, meaning persistent ram is not zeroed
Undefined behavior

I am unsure which resets could occur at step 2, hence the long list. Perhaps anything that could trigger a reset that early would result in ChipPowerOn? Would a JTAG reset be possible that early?

Perhaps the time between a reset that could cause the above sequence becoming possible and the ram init finishing is simply so short that it should not be considered. I personally dislike that solution as a soundness-purist, but I'll follow your lead there.

bjoernQ · 2024-07-03T08:20:58Z

t.b.h. I guess the above scenario is not completely impossible in a development setup but maybe unlikely in production - not too sure

At least not zeroing on e.g. WDT resets can be useful, I guess. On the other hand, if code is manipulating data stored in RTC RAM and gets interrupted by a WDT reset (or any other data preserving reset) the data might be invalid then. I start to wonder if there would be a 100% solution other than e.g. using checksums?

wetheredge · 2024-07-04T05:16:56Z

Oh yeah, interrupted writes are an issue.

What about this?

Update the docs to warn about the potential for resets during writes and to avoid resets immediately after first boot
Require persistent statics to impl AnyBitPattern
Zero after ChipPowerOn and SysBrownOut
- Maybe also some set of SysClkGlitch, CoreEfuseCrc, and/or CorePwrGlitch?

Also, from reading this ESP-IDF source, it looks like some cases of brownout resets are detected by the IDF, so esp-hal will report them as CoreSW. Could those be the system brown-out resets described in the TRM?

Shall I split the fix for #1650 out into its own PR so that isn't held up by finalizing these specifics?

bjoernQ · 2024-07-04T06:30:01Z

Update the docs to warn about the potential for resets during writes and to avoid resets immediately after first boot

Yes, we probably should explain these things more

Require persistent statics to impl AnyBitPattern

That is basically just MaybeUninit, right? But I agree it's probably the only safe thing to do then

Zero after ChipPowerOn and SysBrownOut

Yes, sounds good.

Regarding splitting the PR: Ideally, I would love to see all of this in the next release - not sure if splitting out parts of the PR helps or causes more work for you. I'm fine with both I guess

wetheredge · 2024-07-06T19:37:26Z

I noticed this note on SocResetReason::ChipResetReason:

In ESP-IDF this value (0x01) can also be ChipBrownOut or ChipSuperWdt, however that is not really compatible with Rust-style enums.

Combined with this part of the IDF, I think that a chip level brownout will not give SysBrownOut. Based on this assumption, I'm only initializing after ChipPowerOn.

That is basically just MaybeUninit, right?

I guess it's close, yeah. But this can be used without any unsafe:

#[ram(rtc_fast, persistent)]
static BOOT_MODE: AtomicU8 = AtomicU8::new(0);

match BOOT_MODE.load(order) {
	// ...
}

wetheredge · 2024-07-06T23:45:58Z

Well, I missed the note in more recent versions of the AnyBitPattern docs that explicitly forbids interior mutability because of bytemuck::cast_ref, etc. I've replaced it with a new esp_hal::Persistable trait that matches the actual requirements for #[ram(persistent)]. It's implemented for a minimal set of integers, floats, portable_atomic::Atomic*, and arrays that should cover most usage.

bjoernQ · 2024-07-08T08:42:34Z

Looks good but I'm on vacation this week -not sure if and when I will get to this before next week. So, if anyone likes to review this: feel free

jessebraham

Sorry forgot about that 😅

@MabezDev would you mind taking a look at the changes to the assembly in riscv-rt? I think everything looks okay, just want another set of eyes on that specifically.

MabezDev

Sorry for the delay on my end, this LGTM, thanks for taking the time to write this PR!

bjoernQ mentioned this pull request Jun 14, 2024

Add #[ram(persistent)] to replace uninitialized #1649

Closed

5 tasks

bjoernQ reviewed Jun 14, 2024

View reviewed changes

esp-hal/src/soc/esp32s3/mod.rs Outdated Show resolved Hide resolved

bjoernQ reviewed Jun 14, 2024

View reviewed changes

esp-hal/src/soc/esp32s3/mod.rs Show resolved Hide resolved

bjoernQ reviewed Jun 24, 2024

View reviewed changes

esp-hal-procmacros/src/lib.rs Outdated Show resolved Hide resolved

jessebraham approved these changes Jun 25, 2024

View reviewed changes

Improve #[ram] soundness

ccc7872

wetheredge force-pushed the ram-take-2 branch from 11f002c to ccc7872 Compare July 6, 2024 19:12

wetheredge marked this pull request as ready for review July 6, 2024 19:37

Allow Atomic* in #[ram(persistent)]

6ec6325

jessebraham requested a review from bjoernQ July 7, 2024 02:28

jessebraham approved these changes Jul 8, 2024

View reviewed changes

jessebraham requested review from MabezDev and removed request for bjoernQ July 8, 2024 13:55

MabezDev approved these changes Jul 10, 2024

View reviewed changes

Merge branch 'main' into ram-take-2

5737459

jessebraham added this pull request to the merge queue Jul 10, 2024

Merged via the queue into esp-rs:main with commit a2883ac Jul 10, 2024
30 checks passed

wetheredge deleted the ram-take-2 branch July 10, 2024 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take 2 on `#[ram]` soundness #1677

Take 2 on `#[ram]` soundness #1677

wetheredge commented Jun 14, 2024 •

edited

Loading

wetheredge commented Jun 22, 2024

bjoernQ commented Jun 24, 2024

bjoernQ commented Jun 25, 2024

jessebraham left a comment

wetheredge commented Jun 26, 2024

bjoernQ commented Jun 26, 2024

wetheredge commented Jul 1, 2024

bjoernQ commented Jul 1, 2024

wetheredge commented Jul 3, 2024

bjoernQ commented Jul 3, 2024

wetheredge commented Jul 4, 2024

bjoernQ commented Jul 4, 2024

wetheredge commented Jul 6, 2024

wetheredge commented Jul 6, 2024

bjoernQ commented Jul 8, 2024

jessebraham left a comment •

edited

Loading

MabezDev left a comment

Take 2 on #[ram] soundness #1677

Take 2 on #[ram] soundness #1677

Conversation

wetheredge commented Jun 14, 2024 • edited Loading

Submission Checklist 📝

Extra:

Pull Request Details 📖

Description

Testing

Additional To-dos

wetheredge commented Jun 22, 2024

Footnotes

bjoernQ commented Jun 24, 2024

bjoernQ commented Jun 25, 2024

jessebraham left a comment

Choose a reason for hiding this comment

wetheredge commented Jun 26, 2024

bjoernQ commented Jun 26, 2024

wetheredge commented Jul 1, 2024

bjoernQ commented Jul 1, 2024

wetheredge commented Jul 3, 2024

bjoernQ commented Jul 3, 2024

wetheredge commented Jul 4, 2024

bjoernQ commented Jul 4, 2024

wetheredge commented Jul 6, 2024

wetheredge commented Jul 6, 2024

bjoernQ commented Jul 8, 2024

jessebraham left a comment • edited Loading

Choose a reason for hiding this comment

MabezDev left a comment

Choose a reason for hiding this comment

Take 2 on `#[ram]` soundness #1677

Take 2 on `#[ram]` soundness #1677

wetheredge commented Jun 14, 2024 •

edited

Loading

jessebraham left a comment •

edited

Loading