Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take 2 on #[ram] soundness #1677

Merged
merged 3 commits into from
Jul 10, 2024
Merged

Take 2 on #[ram] soundness #1677

merged 3 commits into from
Jul 10, 2024

Conversation

wetheredge
Copy link
Contributor

@wetheredge wetheredge commented Jun 14, 2024

Submission Checklist 📝

  • I have updated existing examples or added new ones (if applicable).
  • I have used cargo xtask fmt-packages command to ensure that all changed code is formatted correctly.
  • My changes were added to the CHANGELOG.md in the proper section.
  • My changes are in accordance to the esp-rs API guidelines

Extra:

Pull Request Details 📖

Description

Closes #1110, closes #1650, closes #1649

New PR as this is a full rewrite. It also fixes #1650 since both #[ram(persistent)] and #[ram(zeroed)] require the same check that the type can take an all-zero bit pattern. I used bytemuck::Zeroable since it seems like the most widely used, rather than implement. It did mean adding bytemuck as a dependency of esp-hal, but I used version 1.0.0 to maximize compatibility with whatever version is probably already in users' dependency graphs.

Error message for usage on non-Zeroable types
error[E0277]: the trait bound `NonZero<u8>: bytemuck::zeroable::Zeroable` is not satisfied
   --> src/bin/ram.rs:40:18
    |
40  | static NON_ZERO: NonZeroU8 = NonZeroU8::MIN;
    |                  ^^^^^^^^^ the trait `bytemuck::zeroable::Zeroable` is not implemented for `NonZero<u8>`
    |
    = help: the following other types implement trait `bytemuck::zeroable::Zeroable`:
              bool
              char
              isize
              i8
              i16
              i32
              i64
              i128
            and 73 others
note: required by a bound in `assert_is_zeroable`
   --> …/esp-hal/esp-hal/src/lib.rs:229:40
    |
229 |     pub const fn assert_is_zeroable<T: bytemuck::Zeroable>() {}
    |                                        ^^^^^^^^^^^^^^^^^^ required by this bound in `assert_is_zeroable`

For more information about this error, try `rustc --explain E0277`.

If exposing the #[doc(hidden)] function name in the error message is undesirable, I use the method I did in #1649.

Testing

The updated ram example ran on an S3 dev board. I do not have any of the RISC-V ESPs, so someone else will need to test that once it's implemented.

Additional To-dos

  • Update esp-riscv-rt

@wetheredge
Copy link
Contributor Author

It will now only zero persistent ram after initial boot, since, I think1 that's the only time the RTC ram is not preserved. IMO, this should be expanded to all the reset reasons that could theoretically happen before the init had finished. It seems to me that that would be anything not caused by a watchdog or software (for S3: brown out, clock glitch, efuse error, and usb uart/jtag resets).

I also updated esp-riscv-rt to match the xtensa behavior. I don't have any risc-v esps to test on, but it did seem to work for esp32c3 in qemu.

Additionally, I noticed the rv-init-data and rv-init-rtc-data features (and the esp-riscv-rt features they enable) seem to have only been used for the old direct boot support. I also attempted to find uses of them in a GitHub-wide search and did not see any outside of forks of esp-hal. How do you feel about removing these features and perhaps merging the existing zero-rtc-fast-bss and new zero-rtc-fast-persistent into one rtc-ram feature?

Footnotes

  1. Based on the Reset and Clock chapter of the ESP32, ESP32-S3, and ESP32-C3 reference manuals

@bjoernQ
Copy link
Contributor

bjoernQ commented Jun 24, 2024

Additionally, I noticed the rv-init-data and rv-init-rtc-data features (and the esp-riscv-rt features they enable) seem to have only been used for the old direct boot support. I also attempted to find uses of them in a GitHub-wide search and did not see any outside of forks of esp-hal. How do you feel about removing these features and perhaps merging the existing zero-rtc-fast-bss and new zero-rtc-fast-persistent into one rtc-ram feature?

I tend to agree - maybe good to hear others opinion on that, too

@bjoernQ
Copy link
Contributor

bjoernQ commented Jun 25, 2024

Looks quite good overall - I guess after adding a CHANGELOG.md entry this should be good to go

One thing maybe worth considering would be to have a (doc-hidden) function in esp-hal like should_reset_rtc_persisted_ram (naming is debatable) which decides on the reset reasons for both - Xtensa and RISC-V. Would make it easier to change the decision when to zero memory (if we need to reconsider the condition). Not mandatory - just an idea

Copy link
Member

@jessebraham jessebraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking quite good to me, thanks for taking the time to tackle those issues. Once CHANGELOG.md has been updated I think this should be good to go. I'll give @bjoernQ the final say on this, though.

@wetheredge
Copy link
Contributor Author

Great. My last concern is with the edge case of an externally triggered RTC RAM-preserving reset (brown out, power glitch, jtag reset, etc.) occurring very early in the first boot such that the zero initialization gets skipped, causing undefined behavior. If this is a worry, which kinds of resets should be added to the list to trigger zeroing?

I wrote the condition in assembly to match the rest of the risc-v init, since I'm not sure how it interacts with the soundness issues that prompted the decision to avoid Rust there.

@bjoernQ
Copy link
Contributor

bjoernQ commented Jun 26, 2024

Probably all the reset reasons which are chip-reset or system-reset should always trigger zeroing:

(from C6 TRM)

image

I wrote the condition in assembly to match the rest of the risc-v init, since I'm not sure how it interacts with the soundness issues that prompted the decision to avoid Rust there.

Yes - makes sense. In theory when being cautious writing such a function it should be okay but sure - in assembly it's easier to know what really happens 👍

@wetheredge
Copy link
Contributor Author

If that looks good, I'll update the changelog and rebase.

The risc-v code could also use some testing on real chips—I've only been able to test on an S3 and qemu.

@bjoernQ
Copy link
Contributor

bjoernQ commented Jul 1, 2024

Seems like the description in the TRM fooled me a bit and "System Reset: resets the whole digital system, including LP system." doesn't mean LP/RTC RAM is reset. So only "Chip Reset: resets the whole chip." seems to be a reason to zero the memory. (Sorry for the confusion)

Maybe we should just assume "Brown-out system reset" is "Chip Reset" (it can also be System Reset according to the TRM) and only zero out memory for these two reasons

@wetheredge
Copy link
Contributor Author

I interpreted that to mean that a minor brown out could cause a system reset (preserving RTC RAM) and would give a reset code of 0x0F, while a more severe brownout could cause a chip reset with code 0x01. If it could give code 0x0F even after causing a chip reset, then I agree—definitely need to init after that.

I had expanded the list to guard against something like this:

  1. Initial boot starts (reset reason = ChipPowerOn)
  2. While the bootloader is running, something triggers a reset
  3. Second boot starts and completes with a non-ChipPowerOn reset reason, meaning persistent ram is not zeroed
  4. Undefined behavior

I am unsure which resets could occur at step 2, hence the long list. Perhaps anything that could trigger a reset that early would result in ChipPowerOn? Would a JTAG reset be possible that early?

Perhaps the time between a reset that could cause the above sequence becoming possible and the ram init finishing is simply so short that it should not be considered. I personally dislike that solution as a soundness-purist, but I'll follow your lead there.

@bjoernQ
Copy link
Contributor

bjoernQ commented Jul 3, 2024

t.b.h. I guess the above scenario is not completely impossible in a development setup but maybe unlikely in production - not too sure

At least not zeroing on e.g. WDT resets can be useful, I guess. On the other hand, if code is manipulating data stored in RTC RAM and gets interrupted by a WDT reset (or any other data preserving reset) the data might be invalid then. I start to wonder if there would be a 100% solution other than e.g. using checksums?

@wetheredge
Copy link
Contributor Author

Oh yeah, interrupted writes are an issue.

What about this?

  • Update the docs to warn about the potential for resets during writes and to avoid resets immediately after first boot
  • Require persistent statics to impl AnyBitPattern
  • Zero after ChipPowerOn and SysBrownOut
    • Maybe also some set of SysClkGlitch, CoreEfuseCrc, and/or CorePwrGlitch?

Also, from reading this ESP-IDF source, it looks like some cases of brownout resets are detected by the IDF, so esp-hal will report them as CoreSW. Could those be the system brown-out resets described in the TRM?


Shall I split the fix for #1650 out into its own PR so that isn't held up by finalizing these specifics?

@bjoernQ
Copy link
Contributor

bjoernQ commented Jul 4, 2024

Update the docs to warn about the potential for resets during writes and to avoid resets immediately after first boot

Yes, we probably should explain these things more

Require persistent statics to impl AnyBitPattern

That is basically just MaybeUninit, right? But I agree it's probably the only safe thing to do then

Zero after ChipPowerOn and SysBrownOut

Yes, sounds good.

Regarding splitting the PR: Ideally, I would love to see all of this in the next release - not sure if splitting out parts of the PR helps or causes more work for you. I'm fine with both I guess

@wetheredge
Copy link
Contributor Author

I noticed this note on SocResetReason::ChipResetReason:

In ESP-IDF this value (0x01) can also be ChipBrownOut or ChipSuperWdt, however that is not really compatible with Rust-style enums.

Combined with this part of the IDF, I think that a chip level brownout will not give SysBrownOut. Based on this assumption, I'm only initializing after ChipPowerOn.

That is basically just MaybeUninit, right?

I guess it's close, yeah. But this can be used without any unsafe:

#[ram(rtc_fast, persistent)]
static BOOT_MODE: AtomicU8 = AtomicU8::new(0);

match BOOT_MODE.load(order) {
	// ...
}

@wetheredge wetheredge marked this pull request as ready for review July 6, 2024 19:37
@wetheredge
Copy link
Contributor Author

Well, I missed the note in more recent versions of the AnyBitPattern docs that explicitly forbids interior mutability because of bytemuck::cast_ref, etc. I've replaced it with a new esp_hal::Persistable trait that matches the actual requirements for #[ram(persistent)]. It's implemented for a minimal set of integers, floats, portable_atomic::Atomic*, and arrays that should cover most usage.

@jessebraham jessebraham requested a review from bjoernQ July 7, 2024 02:28
@bjoernQ
Copy link
Contributor

bjoernQ commented Jul 8, 2024

Looks good but I'm on vacation this week -not sure if and when I will get to this before next week. So, if anyone likes to review this: feel free

Copy link
Member

@jessebraham jessebraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry forgot about that 😅

@MabezDev would you mind taking a look at the changes to the assembly in riscv-rt? I think everything looks okay, just want another set of eyes on that specifically.

@jessebraham jessebraham requested review from MabezDev and removed request for bjoernQ July 8, 2024 13:55
Copy link
Member

@MabezDev MabezDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on my end, this LGTM, thanks for taking the time to write this PR!

@jessebraham jessebraham added this pull request to the merge queue Jul 10, 2024
Merged via the queue into esp-rs:main with commit a2883ac Jul 10, 2024
30 checks passed
@wetheredge wetheredge deleted the ram-take-2 branch July 10, 2024 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

#[ram(zeroed)] statics are unsound #[ram(uninitialized)] statics are unsound
4 participants