From a4470378fa7ac88e013760d17135b5958c96d1dc Mon Sep 17 00:00:00 2001 From: Jorge Aparicio Date: Wed, 23 Oct 2019 12:45:30 -0500 Subject: [PATCH] deal with rustc/LLVM bug around LLVM's `dereferenceable` and volatile ops --- rfcs/0000-deferenceable-volatile.md | 498 ++++++++++++++++++++++++++++ 1 file changed, 498 insertions(+) create mode 100644 rfcs/0000-deferenceable-volatile.md diff --git a/rfcs/0000-deferenceable-volatile.md b/rfcs/0000-deferenceable-volatile.md new file mode 100644 index 00000000..a4a4fb77 --- /dev/null +++ b/rfcs/0000-deferenceable-volatile.md @@ -0,0 +1,498 @@ +- Relevant components: `svd2rust` +- Feature Name: deferenceable_volatile +- Start Date: 2019-10-23 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary + +The current implementation of the peripheral API generated by `svd2rust` is +affected by bug [rust-lang/rust#55005]. The (hypothetical) observable effect of +this bug on the `svd2rust` API is that LLVM may insert spurious reads to MMIO +registers in optimized builds. These spurious reads in turn may cause some bits +of the register to be cleared (e.g. interrupt flags) or flipped, changing the +intended behavior of the program. + +[rust-lang/rust#55005]: https://github.com/rust-lang/rust/issues/55005 + +The way to prevent this rustc/LLVM bug is to *never* create a reference (`&-` +or `&mut-`) to a MMIO register, or in general to memory that is meant to be +accessed only through volatile operations. This document explores alternative +implementations of the `svd2rust` API to avoid this bug. + +# Current implementation + +To visualize the problem consider an hypothetical device with 2 similar USART +peripherals. Each USART peripheral has two registers: a 32-bit control register +(`CR`) and a 16-bit data register (`DR`); these registers are laid out next to +each other in memory; the `CR` registers appears first in memory (has the lower +address). + +In the current implementation these peripherals are represented using the +following types: + +``` rust +pub struct USART1 { _not_sync: PhantomData<*mut ()> } +pub struct USART2 { _not_sync: PhantomData<*mut ()> } + +impl Deref for USART1 { // repeat for USART2 + type Target = usart::RegisterBlock; + + fn deref(&self) -> &usart::RegisterBlock { + unsafe { &*(0x4000_0000 as *const usart::RegisterBlock) } + } +} + +// repeat for USART2 with address `0x4001_0000` + +mod usart { + #[repr(C)] + pub struct RegisterBlock { + pub CR: CR, // newtype over `VolatileCell` + pub DR: DR, // newtype over `VolatileCell` + } +} +``` + +The `VolatileCell` is a newtype over `UnsafeCell` with the following API: + +``` rust +struct VolatileCell { inner: UnsafeCell } + +impl VolatileCell { + unsafe fn read(&self) -> T { // it also has a `write` method + ptr::read_volatile(self.inner.get()) + } +} +``` + +The `UnsafeCell::get(&self)` method (implicitly) creates a shared reference to +an `UnsafeCell` and thus runs into rust-lang/rust#55005. Because +`VolatileCell::{read,write}` call `UnsafeCell::get` they are also affected by +the bug. + +Code like `&usart1.cr` (where `usart1: USART1`) may also trigger the bug because +it creates a shared reference to `CR`, which is a newtype over +`UnsafeCell`. + +# Detailed design + +The main problem with the current implementation is that `RegisterBlock` is a +sized type that *represents* a register block in MMIO memory. Therefore all +instances of `&any::RegisterBlock` are references (pointers) into MMIO memory +and will run into the bug. The logical solution is to replace the concept of +`RegisterBlock` with something else. + +This section explores two alternative implementations. + +## ZSTs all the way down + +One approach is to turn `RegisterBlock` into a zero sized type (ZST): +`Registers` (\*). This `Registers` structure will contain more ZSTs that +represent the registers of a peripheral. Each of these ZST registers will +implement today's API by directly calling `ptr::{read,write}_volatile_` on a +*raw pointer* -- this way references to MMIO memory are never created. The code +below illustrates the idea: + +(\*) we don't really have to change the name of the type in the actual +implementation but using different names makes the change easier to explain. + +``` rust +struct USART1 { _not_sync: PhantomData<*const ()> } + +pub mod usart { + pub struct Registers { + pub CR: CR, + pub DR: DR, + } + + pub struct CR { _not_send_or_sync: PhantomData<*const ()> } + + impl CR { + // simplified so it doesn't match the current svd2rust API + pub fn read(&self) -> u32 { + unsafe { ptr::read_volatile(0x4000_0000 as *const u32) } + } + } + + impl DR { + pub fn read(&self) -> u16 { + unsafe { ptr::read_volatile(0x4000_0004 as *const u16) } + // ^ ^^ + } + } +} +``` + +### Breaking changes + +In today's API one can write a generic function that works with peripherals +USART1 and USART2 using the following trait bound: `U: Deref`. Alternatively, one can use a concrete function that +takes an argument of type `&usart::RegisterBlock`; this function works with both +`&USART1` and `&USART2` because they implement the `Deref` trait. + +``` rust +fn generic(usart: U) where U: Deref { + usart.dr.read(); +} + +fn concrete(usart: &usart::RegisterBlock) { + usart.dr.read(); +} + +fn pass_usart1(usart: USART1) { + concrete(&usart); // (A) + generic(usart); +} + +fn pass_usart2(usart: USART2) { + // ^ + concrete(&usart); // (B) + generic(usart); +} +``` + +The ability to write the `concrete` function is lost when moving to the ZST +approach. The reason is that under this approach `&usart::Registers` is also a +ZST; in contrast, in the current implementation `&usart::RegisterBlock` is a +pointer into MMIO memory so in line (A) we are passing the pointer `0x4000_0000` +to `concrete` and in line (B) we are passing the pointer `0x4001_0000`. + +Under the ZST approach it's possible to write something like the `generic` +function but the implementation needs to become more complex: + +``` rust +trait Peripheral { + const BASE_ADDRESS: usize; +} + +pub type USART1 = usart::Registers; +pub type USART2 = usart::Registers; + +mod usart { + struct _1; + struct _2; + + // repeat for `_2` but with address = `0x4000_1000` + impl Peripheral for _1 { + const BASE_ADDRESS: usize = 0x4000_0000; + } + + pub struct Registers

+ where + P: crate::Peripheral, + { + pub CR: CR

, + pub DR: DR

, + } + + pub struct CR

+ where + P: Peripheral, + { + _p: PhantomData

,_ + _not_send: PhantomData<*mut ()>, + } + + // repeat for `DR

` but with `OFFSET=0x4` and `u16` instead of `u32` + impl

CR

+ where + P: Peripheral, + { + const OFFSET: usize = 0x0; + + pub fn read(&self) -> u32 { + unsafe { + ptr::read_volatile((P::BASE_ADDRESS + Self::OFFSET) as *const u32) + } + } + } + + impl

Registers

{ + pub(crate) unsafe fn new() -> Self { + Registers { + CR: CR { _p: PhantomData, _not_send: PhantomData_}, + // .. + } + } + } +} + +pub struct Peripherals { + pub USART1: USART1, + pub USART2: USART2, +} + +impl Peripherals { + pub fn take() -> Option { + // omitted: runtime check + + unsafe { + Some(Peripherals { + USART1: usart::Registers::new(), + USART2: usart::Registers::new(), + }) + } + } +} +``` + +Then generic code will look like this: + +``` rust +fn generic_zst

(usart: USART

) where P: Peripheral { + usart.dr.read(); +} +``` + +Furthermore, this ZST implementation with generics changes the signatures of +registers types: `CR<_1>` is the control register of USART1 and `CR<_2>` is the +control register of USART2. These two types are ZST so the exact address of the +register must be stored in the type system. In today's implementation, the `&CR` +value is a pointer, that points to the particular register (`0x4000_0000` or +`0x4000_1000**). + +In summary, the breaking changes *specific* to this approach are: + +- You can't write a `concrete` function that works with all instances of + a peripheral. + +- Generic functions that work with all instances of a peripheral need to be + modified (`generic` -> `generic_zst`). + +- The types of registers changes from a concrete type (`today::CR` ) to a + generic type (`zst::CR

`). Note that registers of peripherals that have only + a single instance do *not* need to be turned into generic types. + +## `VolAddress` + +Another approach is to not use ZSTs but instead store the address of each +register "on the stack" using the [`VolAddress`] type. The code below +illustrates the idea: + +[`VolAddress`]: https://docs.rs/voladdress/0.2.5/voladdress/struct.VolAddress.html + +``` rust +pub struct USART1(usart::Registers); +pub struct USART2(usart::Registers); + +impl Deref for USART1 { + type Target = usart::Registers; + + fn deref(&self) -> &Self::Target { + &self.0 + } +} + +mod usart { + pub struct Registers { + // simplified; these should be newtypes + pub CR: VolAddress, + pub DR: VolAddress, + } + + impl Registers { + pub(crate) unsafe fn new(base_address: usize) -> Self { + Registers { + CR: VolAddress::new(base_address + 0x0), + DR: VolAddress::new(base_address + 0x4), + } + } + } +} + +pub struct Peripherals { + pub USART1: USART1, + pub USART2: USART2, +} + +impl Peripherals { + pub fn take() -> Option { + // omitted: runtime check + + unsafe { + Some(Peripherals { + USART1: USART1(usart::Registers::new(0x4000_0000)), + USART2: USART2(usart::Registers::new(0x4000_1000)), + }) + } + } +} +``` + +### Breaking changes + +This approach has no *specific* breaking changes. The `concrete` and `generic` +that one can write today will continue to work under this alternative +implementation. The register type does not change in any visible way when +switching to the `VolAddress` approach. + +### Non-zero cost + +The `VolAddress` approach is not zero cost in comparison to today's +implementation and to the ZST approach. This can be observed when a peripheral +type (or an abstraction that contains a peripheral) is stored in a static +variable. This pattern is commonly used in RTFM applications (peripherals are +used as [late resources]) but also appears in `cortex-m-rt`-only applications +that use the `Mutex>>` pattern. + +[late resources]: https://rtfm.rs/0.5/book/en/by-example/resources.html#late-resources + +The following program demonstrates the non-zero-cost-ness. + +#### `VolAddress` + +``` rust +// a late resource +// (or this could be `static _: _Mutex>>`) +static mut USART1: MaybeUninit = MaybeUninit::uninit(); + +#[entry] +fn main() -> ! { + // simulate the initialization of a late resource + unsafe { + USART1 + .as_mut_ptr() + .write(usart::Registers::new(0x4000_0000)); + } + + loop {} +} + +#[exception] +fn SVCall() { + unsafe { + (*USART1.as_ptr()).DR.read(); + } +} + +#[exception] +fn SysTick() { + unsafe { + (*USART1.as_ptr()).CR.read(); + } +} +``` + +Disassembly (`arm-none-eabi-objdump`): + +``` armasm +00000400 : + 400: f240 0000 movw r0, #0 + 404: f2c2 0000 movt r0, #8192 ; 0x2000 + 408: 6840 ldr r0, [r0, #4] + 40a: 8800 ldrh r0, [r0, #0] + 40c: 4770 bx lr + +0000040e : + 40e: f240 0000 movw r0, #0 + 412: f2c2 0000 movt r0, #8192 ; 0x2000 + 416: 6800 ldr r0, [r0, #0] + 418: 6800 ldr r0, [r0, #0] + 41a: 4770 bx lr +``` + +Binary size (`arm-none-eabi-size`): + +``` text +app : +section size addr +(..) +.bss 0x8 0x20000000 +(..) +``` + +#### Current implementation + +``` rust +static mut USART1: MaybeUninit = MaybeUninit::uninit(); +// ^^^^^^ + +#[entry] +fn main() -> ! { + unsafe { + USART1.as_mut_ptr().write(USART1::steal()); + // ^^^^^^^^^^^^^^^ + } + + loop {} +} + +// SVCall and SysTick don't change +``` + +Disassembly (`arm-none-eabi-objdump`): + +``` armasm +00000400 : + 400: 2004 movs r0, #4 + 402: f2c4 0000 movt r0, #16384 ; 0x4000 + 406: 8800 ldrh r0, [r0, #0] + 408: 4770 bx lr + +0000040a : + 40a: f04f 4080 mov.w r0, #1073741824 ; 0x40000000 + 40e: 6800 ldr r0, [r0, #0] + 410: 4770 bx lr +``` + +Binary size (`arm-none-eabi-size`): + +``` text +app : +section size addr +(..) +.bss 0x0 0x20000000 +(..) +``` + +### ZST + +The ZST approach produces the same machine code as the current implementation. + +--- + +The difference boils down to `voladdress::USART1` *not* being a ZST. This causes +the register pointers (`VolAddress` values) to be stored in static memory; this +can use hundred of bytes of SRAM for peripherals that have many registers (each +register uses 4-byte of memory on a 32-bit device). Apart from the memory usage, +each register has indirection: the register address is first loaded from static +memory. + +## Common breaking changes + +It is not possible to implement the `RegisterBlock::ptr` API under any of the +alternative implementations. This API creates a raw pointer into the register +block, and thus into MMIO memory. This action by itself does not trigger the bug +(rust-lang/rust#55005) but in both alternative implementations the `Registers` +struct that replaces `RegisterBlock` no longer represents MMIO memory so `*mut +Registers*` is _not_ equivalent to `*mut RegisterBlock`. In the ZST approach, +`&Registers` is also a ZST; and in the `VolAddress` approach, `&Registers` is a +pointer into some stack-allocated value. + +# Alternatives + +## Do nothing + +An alternative is to keep the `svd2rust` API as it is and hope for +rust-lang/rust#55005 to be fixed upstream (e.g. by removing the `deferenceable` +attribute from `&UnsafeCell` when `rustc` lowers it to LLVM-IR). + +# Conclusion + +Three options have been explored in order to address bug rust-lang/rust#55005. + +- Keep the current implementation and hope the bug is fixed soon. + +- Re-implement the `svd2rust` API using the ZST approach. This approach is as + zero cost as the current implementation but causes 4 breaking changes in the + public API. + +- Re-implement the `svd2rust` API using the `VolAddress` approach. This approach + only causes one breaking change but it's not zero cost: it increases memory + usage and reduces performance in a common use case. + +If it is decided to use an alternative implementation it is also advisable to +deprecate the `vcell` and `volatile-register` crates and point current direct +users of those crates to the alternatives presented in this document.